Statistical relevance of meteorological ambient conditions and cell attributes for nowcasting the life cycle of convective storms

The usually short lifetime of convective storms and their rapid development during unstable weather conditions makes forecasting these storms challenging. It is necessary, therefore, to improve the procedures for estimating the storms' expected life cycles, including the storms' lifetime, size, and intensity development. We present an analysis of the life cycles of convective cells in Germany, focusing on the relevance of the prevailing atmospheric conditions. Using data from the radar‐based cell detection and tracking algorithm KONRAD of the German Weather Service, the life cycles of isolated convective storms are analysed for the summer half‐years from 2011 to 2016. In addition, numerous convection‐relevant atmospheric ambient variables (e.g., deep‐layer shear, convective available potential energy, lifted index), which were calculated using high‐resolution COSMO‐EU assimilation analyses (0.0625°), are combined with the life cycles. The statistical analyses of the life cycles reveal that rapid initial area growth supports wider horizontal expansion of a cell in the subsequent development and, indirectly, a longer lifetime. Specifically, the information about the initial horizontal cell area is the most important predictor for the lifetime and expected maximum cell area during the life cycle. However, its predictive skill turns out to be moderate at most, but still considerably higher than the skill of any ambient variable is. Of the latter, measures of midtropospheric mean wind and vertical wind shear are most suitable for distinguishing between convective cells with short lifetime and those with long lifetime. Higher thermal instability is associated with faster initial growth, thus favouring larger and longer living cells. A detailed objective correlation analysis between ambient variables, coupled with analyses discriminating groups of different lifetime and maximum cell area, makes it possible to gain new insights into their statistical connections. The results of this study provide guidance for predictor selection and advancements of nowcasting applications.


INTRODUCTION
Forecasting and nowcasting convective storms are challenging tasks for weather services. Every year, these storms cause substantial damage to infrastructure, property, and agricultural areas. In central Europe, convective storms occur most frequently from April to September, leading to damage totalling billions of euros, as well as many injuries and fatalities every year Taszarek et al., 2019). In Germany alone, some single convective events during the past decade caused more than €100 million to occasionally more than €1 billion in damage by, for instance, local flash floods stemming from long-living quasi-stationary cells or by extended hail swaths containing large hailstones produced by long-living supercells (e.g., Piper et al., 2016;Bronstert et al., 2017;Hübl, 2017;Vogel et al., 2017;Kunz et al., 2018;Mohr et al., 2020;Wilhelm et al., 2021). Difficulties in forecasting convective storms arise from the complex processes and scale interactions involved, the lack of comprehensive and detailed observational information, and the limited representation of deep moist convection (DMC) in numerical weather prediction (NWP) models. Improvements in estimating the short-term evolution of convective storms are essential for providing temporally and spatially more accurate warnings to the public and emergency services (e.g., Wapler et al., 2018).
During the past few decades, numerous dynamical quantities and so-called convective indices quantifying different atmospheric prerequisites for the development of convective storms (thermal instability, sufficient moisture, lifting; see Doswell, 1987;Johns and Doswell, 1992) have been invented. This has allowed for the ingredients-based forecasting of storm occurrences and associated hazards (e.g., Huntrieser et al., 1997;Rasmussen and Blanchard, 1998;Haklander and van Delden, 2003;Manzato, 2003;Brooks, 2007;Kunz, 2007). Relationships between thunderstorm occurrences or different convective phenomena and ambient conditions have recently been analysed in several studies using radar, satellite, lightning, radiosonde, and model data, unravelling characteristic connections in a statistical sense (e.g., Kunz, 2007;Brooks, 2009;Kaltenböck et al., 2009;Mohr and Kunz, 2013;Púčik et al., 2015;Ukkonen et al., 2017;Westermayer et al., 2017;Kunz et al., 2020;Taszarek et al., 2021). An insufficiently clarified question is whether and which of the ambient variables and parameters have the highest prediction skill for the life-cycle properties of convective cells, such as lifetime, the development of the cells' extent in terms of regions of high radar reflectivity and precipitation, and the development of the cells' intensity characterised, for example, by a large fraction of very high radar reflectivity. Incorporating information from these ambient variables about the expected life cycles might help with reducing the intricacies of operational nowcasting procedures. Zöbisch et al. (2020) recently presented a study on the characteristics of DMC aimed at improving thunderstorm nowcasting. Specifically, they provided a detailed review of studies investigating convective storm life cycles during the past few decades. As (severe) thunderstorms occur in many regions of the Earth, on a broad range of scales, and under a large variety of atmospheric conditions, drawing generally valid conclusions from life-cycle studies is a challenging task. Nevertheless, improved forecasting and nowcasting techniques through improved life-cycle representation have the potential to lead to more reliable warnings, and thus save lives and prevent damage.
Ordinary and widely used nowcasting techniques for hazards related to convective storms involve predicting their development for the next minutes to hours. These methods mainly extrapolate the track of a storm that a certain tracking algorithm has detected based on temporally highly resolved radar or satellite observations (Wang et al., 2017). Possible changes in intensity, size, propagation speed, and direction, however, usually are not taken into account with these techniques. Still, convective cells are intrinsically in a state of alteration during their life cycles (e.g., Markowski and Richardson, 2010). One of the challenges of thunderstorm nowcasting "is the improvement of predictions of the remaining lifetime of existing thunderstorms [ … ] regardless of their organisation type" (or life-cycle phase; Zöbisch et al., 2020). The organisation type, or convective mode, however, depends on the mesoscale atmospheric ambient conditions that may be described by meteorological parameters such as the mean wind and the vertical wind shear in the lower to middle troposphere (e.g., Weisman and Klemp, 1982;Markowski and Richardson, 2010;Trapp, 2013).
Some nowcasting methods already use information about ambient conditions from the NWP model data along with observational data: German Weather Service's (DWD's) nowcasting system, NowCastMix, is used for nowcasting both summer and winter warning events (James et al., 2018). NowCastMix combines NWP forecasts, real-time weather station reports, lightning data, weather radar products, and data from convective cell detection and tracking methods with a fuzzy logic approach to produce an objective hazard warning. The ProbSevere system of the National Oceanic and Atmospheric Administration extracts and integrates data from rapid-update NWP forecasts, satellite, lightning, and radar data via multiplatform multiscale storm identification and tracking. This is done to compute severe hazard probabilities in a statistical framework using naive Bayesian classifiers for machine learning (Cintineo et al., 2020).
The Context and Scale Oriented Thunderstorm Satellite Predictors Development (COALITION-3) system of the Swiss weather service, MeteoSwiss, probabilistically estimates storm developments for the next hour on the basis of NWP, satellite, lightning, and radar data. It also takes into account the influence of orography and uses gradient-boosted decision trees for machine learning (Nisi et al., 2014;Hamann et al., 2019). For nowcasting lead times of 15 min and more, they found that NWP model information becomes the most important source. Recently, Mecikalski et al. (2021) presented a multisensor (satellite, lightning, and radar data) random forest approach to assess predictors' importance and the predictive skill of severe thunderstorm and tornado warnings in a multivariate framework. Because they suspected that "a severe storm adjacent to a nonsevere storm could be assigned the same NWP fields in terms of kinematic and thermodynamic fields", they did not consider the ambient conditions of the storms.
For overcoming the simple extrapolation of thunderstorm tracks and inspecting the importance of ambient variables in detail, it is imperative to investigate which ambient variables are strong predictors for a sound lifecycle estimation, which may enhance today's nowcasting systems. Moreover, insights into the importance of the parameters compared with information about the cells' histories at different life-cycle stages are essential. These questions are addressed in the study at hand, where the main element is the statistical analysis of the relationships between life-cycle attributes and ambient variables. In particular, following the investigations of Wilhelm (2022), this article addresses the following two main scientific questions: (1) Under which range of prevailing ambient conditions does DMC develop, and how are the related ambient variables statistically correlated with each other and with cell attributes at the beginning of the cells' life cycles? (2) Which ambient variables and cell attributes correlate best with the storm properties of lifetime and maximum area, indicating the potential for the improvement of nowcasting procedures?
Section 2 introduces KONRAD (from the German Konvektionsentwicklung in Radarprodukten, meaning convection evolution in radar products; Lang, 2001) cell detection and tracking data, as well as Consortium for Small-scale Modelling (COSMO-EU) assimilation analyses, and it describes the methodology of the combination of object data (radar) and meteorological fields (model). Section 3 briefly discusses the convective cell properties, provides an overview of the ambient conditions prevailing during the storms detected, and quantifies the statistical correlations between them. In Section 4, the correlation of ambient variables, initial cell growth, and life-cycle characteristics (like cell lifetime and maximum cell area) is quantified, thus highlighting their potential predictive value. Section 5 summarises and discusses the most important findings.

DATA AND METHODS
The analyses presented in this study are based on two datasets. First, the life cycles of convective cells are represented by data of DWD's operational radar-based cell detection and tracking algorithm, KONRAD. Second, meteorological ambient conditions are assessed by means of assimilation analyses of DWD's formerly operational regional NWP model, COSMO-EU. The investigation period ranges from 2011 to 2016, with only the summer half-years being considered (April to September). These 6 months mark the time span when most thunderstorms occur in Germany (e.g., Wapler and James, 2015). Both tracking and model data, their preparation, and their combination are described separately in the following.

2.1
Observation data: Convective cell detection and tracking with KONRAD As described in detail in Wapler (2021), the cell detection and tracking algorithm KONRAD utilises two-dimensional (2D) composite radar reflectivity data from the terrain-following near-surface precipitation scan (RX composite) of the German C-band weather radar network consisting of 17 operational radar stations (Mammen et al., 2010). The time and horizontal resolution of the 2D composite are 5 min and 1 km respectively. Thus, KONRAD runs operationally every 5 min, defining a convective cell numerically as a continuous area of at least 15 km 2 size with a radar reflectivity factor Z of 46 dBZ or more (cell area). This operationally used threshold is relatively high compared with other algorithms and studies (e.g., Goudenhoofdt and Delobbe, 2013;Nisi et al., 2018, and references therein), so considerable parts of the cells' cumulus and dissipation stages are not included. The tracking of convective cells is realised by matching the cells of two consecutive radar composites as described in Lang et al. (2003). Similar to many cell detection and tracking algorithms, KONRAD is not able to correctly handle all cell life cycles, especially when it comes to splitting or merging two or more cells. Owing to the strict numerical definition of a cell as already mentioned herein, many splits or merges seen via KONRAD do not represent real physical cell splits or merges. Moreover, obtaining a single full convective cell track is not feasible in cases of multicells, cell clusters, or mesoscale convective systems, which appear frequently.
With this tracking data, the life cycles (i.e., object-based sequences of multiple cell attributes) of the convective cells detected are analysed, and cells that have been detected at only one point in time are discarded. Cell attributes comprise age, size, propagation direction, and speed, as well as the position of the reflectivity-weighted centroid, and the cell-enframing latitude-longitude rectangle (see Section 2.3). For the purpose of avoiding a priori misleading interpretations arising from the difficulties of defining the cell's full life cycle as described herein, a vicinity criterion is applied: all detected cells that might have developed or dissipated close to another one (neighbour) are discarded, as also are their neighbours. A sensitivity study revealed 5 km as a good compromise for the minimum required distance between the latitude-longitude rectangles of two cells so that they are not filtered out based on the vicinity criterion. Thus, the analysis presented in this study focuses primarily on isolated convective storms (i.e., single cells or supercells) that have undisturbed life cycles and that do not dynamically interact with other cells. Hence, the focus on isolated convection is reasonable: on the one hand, supercells exhibit the greatest damage potential (e.g., Kunz et al., 2018;Wilhelm et al., 2021); on the other hand, single cells represent the "most simple cases" and a sufficiently large fraction of the available data (see later). It should be noted, however, that the isolated convection might still have been influenced by weaker cells in the vicinity that did not fulfil the KONRAD criteria. The vicinity filter is connected with more filters that discard cells due to missing life-cycle information in connection with the radar coverage or with unrealistic propagation/evolution tendencies (e.g., direction change of more than 30 • /5 min, smoothed over three detection times; growth of more than 50 km 2 /5 min). Of the original 103,563 life cycles, 35.5% are discarded based on the vicinity filter, 27.3% are discarded based on the other filters, and 37.2% (38,553) remain after filtering. The latter fraction seems to be low at first glance, but it is high compared with the 92.1% of organised precipitation cores in Germany that Pscheidt et al. (2019) reported from their 2-year statistics. The difference might emerge from the fact that they define organisation in terms of connectivity of single detections of precipitation cores, whereas in the study at hand the dynamical organisational form in terms of single cells, multicells, supercells, and mesoscale convective systems are referred to -similar to Markowski and Richardson (2010). Moreover, KONRAD uses a much higher operational reflectivity threshold, so that a direct and fair comparison is not possible.

2.2
Model data: NWP model COSMO-EU The DWD operationally used the NWP model COSMO-EU for regional weather forecasts from the end of September 2005 to November 2016. The model covers almost all of Europe with a grid point distance of 0.0625 • (≈7 km). Hence, non-hydrostatic mesoscale effects are partially captured. Meanwhile, the DMC still needs to be parametrised, as do many other subgrid-scale processes. Vertically, 40 hybrid terrain-following model levels are defined, ranging from 10 m above ground to a height of 21.75 km (Schulz and Schättler, 2014).
The operational workflow at the DWD is subdivided into the data assimilation cycle and the generation of NWP analyses and forecasts. Whereas COSMO-EU NWP analyses were generated four times a day, assimilation analyses were produced within the assimilation cycle each hour. Additionally, the availability of assimilation analyses is not as time critical compared with the NWP analyses needed to initialise the corresponding forecasts. This results in a later cut-off time for observations to arrive. The observations are then considered in the assimilation analyses, and this leads to a higher analysis quality. Consequently, this study makes use of hourly assimilation analyses. Assimilation has been implemented via Newtonian relaxation (nudging), a four-dimensional procedure that pulls prognostic model variables during the forward integration of the model to observations within a predetermined time frame (Schraff, 1996;1997;Schraff and Hess, 2013). For the period from September 2014 to November 2016, high-resolution OPERA rain rate data, which can be derived from the reflectivity factor Z, have been assimilated into COSMO-EU via additional latent heat nudging (Stephan et al., 2008;DWD, 2014;Saltikoff et al., 2019).
The analyses enable the calculation of a multitude of meteorological variables and convective indices that are not stored by default in the DWD database (Wilhelm, 2022). This is accomplished by extending the COSMO-EU-internal post-processing routines, initialising the model with the analyses and re-outputting the desired quantities at forecast time t = 0 for pressure and/or height levels. These quantities characterise different aspects of convective stability as well as air mass temperature, moisture content, and dynamical conditions (see Section 3.2). The newly implemented variables have been verified in detail by comparing them with radiosonde soundings, reanalyses, and the literature. Compared with similar studies from Europe or the United States in the context of convective events or their hazards on the basis of either coarse reanalyses (e.g., Brooks, 2009;Kaltenböck et al., 2009;Ukkonen et al., 2017;Westermayer et al., 2017) or proximity sounding data (e.g., Kunz, 2007;Mohr and Kunz, 2013;Púčik et al., 2015), COSMO-EU provides a temporally and spatially high-resolution representation of the atmospheric state (Miller and Mote, 2018;Zöbisch et al., 2020).

Data combination
The object-based KONRAD life cycles are combined with the meteorological ambient variables from COSMO-EU. For this purpose, the relevant 2D and three-dimensional (3D) COSMO-EU fields are linearly interpolated to the 5-min resolution of the KONRAD data at first. Subsequently, for every time instance of the detection of a cell, a circular surrounding area is put around the respective cell. The radius of this area is determined as the composition of the radius of a circle, which encloses the cell-enframing latitude-longitude rectangle minimally, as well as the additional fixed width of R fix = 25 km, which leads to an adaptive surrounding area depending on the size of the cell detected (Appendix A.2, Figure A3). Within that area, nine statistical measures of the ambient variables are computed at the detection time (percentiles, distance-weighted and unweighted spatial averages and standard deviations). Various sensitivity studies, conducted with the focus on the described aspects for a convectively active time period in May-June 2016 (Piper et al., 2016), consolidate the suitability of the chosen settings. Note that, for example, choosing an earlier time instance (e.g., 30 or 60 min before the detection time) for the computation of the pre-convective ambient conditions or culling regions of precipitation (as regions of modified ambient conditions) have only a minor qualitative effect on the statistical measures and, therefore, are omitted. For the sake of easy reading, the indication of the statistical measure is omitted in the following, as mostly only minor differences occur in the subsequent analyses, except for standard deviations that are marked with the superscript "(sd)". Of the available ambient variables, 33 are used for further discussion to focus on the most relevant ones and to avoid redundancies (Appendix A.2, Table A1). The selection criterion is based mainly on the discrimination ability of the variables between shortand long-living cells (see Section 4). From this, 28 of the 33 variables are obtained, among them the temperature and the equivalent potential temperature (approximation for a pseudo-adiabatic process; Bolton, 1980) at the 850 hPa level: T 850 and e,850 . To represent air mass temperature gradients in the cells' surroundings, the standard deviations of these temperature variables T (sd) 850 and (sd) e,850 respectively, are also considered. Moreover, the well-known lifting condensation level height (LCL MU ), the vertically integrated water vapour (IWV), and the stability index Δ e (Atkins and Wakimoto, 1991) -here in the formulation of Kunz (2007) -are included as well, yielding 33 ambient variables in total.

General cell properties
This section briefly summarises the properties of the convective life cycles without any further information about their ambient conditions. Wapler (2021) already presented the multiyear life-cycle statistics of more than 100,000 KONRAD cells between 2007 and 2017, partly focusing on the 5-year period from 2013 to 2017. They mentioned that the life-cycle properties were qualitatively very similar for their long and short periods, so the 6-year period from 2011 to 2016 with the 38,553 KONRAD cells used in this study sufficiently captures the overall statistics. It should be noted, however, that the filtering of the life cycles in this study as described in Section 2.1 differs somewhat from the methods of Wapler (2021). However, the goal is the same: to analyse the undisturbed life cycles of isolated convection. In addition to the description of the general cell properties in the following paragraphs, short life-cycle analyses in the style of Wapler (2021) are presented in Appendix A.1, which motivate the selection of cell attributes and introduce mathematical descriptions.
Owing to the different types of organisation of convective storms (single cells, multicells, supercells, and mesoscale convective systems), the life-cycle characteristics of individual cells may be strongly influenced by adjacent cells, leading to ambiguous life-cycle definitions. Cell clusters and mesoscale convective systems, consisting of multiple individual convective cells, are much more difficult to track compared with isolated convective cells. The lifetime of the embedded individual cells and their spatial extent detected by KONRAD do not allow one to draw conclusions about the main characteristics of the entire cell complex. These are the main arguments for focusing on non-splitting and non-merging isolated convection, as other storms do not disturb their individual life cycles. Still, the need exists for efficient nowcasting methods that are as general as possible to obtain realistic estimates for all future cell evolutions. It is possible that the findings on isolated convection may be transferred in such a way that nowcasting methods can be improved in the prediction of other types of convective storm organisation. We recognise, however, that this remains to be proven in future studies and that tracking-based statistical investigations of the life cycles of multicellular convection are needed to foster both a better understanding of the underlying mechanisms and a more sophisticated treatment of them in cell detection and tracking algorithms.
The 38,553 isolated storms of the investigation period of the summer half-years of 2011 to 2016 occurred in all parts of Germany ( Figure 1). Cell tracks can approximately be represented by polygons derived from the cell-enframing latitude-longitude rectangle and direction information gathered through KONRAD (see Section 2.1). The regions in southern Germany and around Frankfurt am Main (50.1 • N, 8.7 • E) that exhibit comparatively high cell numbers are well known for the frequent occurrence of thunderstorms (e.g., Wapler and James, 2015;Piper and Kunz, 2017;Taszarek et al., 2019). Remarkably different from storm climatology is the high number of cells in northwestern Germany that can be ascribed to the convectively active period in May-June 2016 (Piper et al., 2016). The spatial distribution of the number of storm days (i.e., days with at least one storm passing by a location on a 1 × 1 km 2 grid) turns out to be qualitatively and quantitatively very similar to the distribution of total cell number shown in Figure 1 (i.e., days with more than one storm at a location are rare in the dataset).
The absolute frequency distributions of cell lifetime T, maximum area during the life cycle A max (Figure 2a), and track length (not shown) appear very skewed, expressing the prevalence of short-living, small cells in the dataset (Wapler, 2021). In general, long-living cells reach comparatively higher maximum cell areas and longer tracks than short-living cells do. Nevertheless, their maximum area, propagation speed, and track length may differ significantly among each other. Cells with a long lifetime of more than 60 min (1,096 cells or 2.8%) and 120 min (121 cells or 0.3%) are detected across the entire country. Note that the lifetime at the first detection is assumed to be 2 min for all cells. Owing to the strict cell definition and detection criterion of KONRAD regarding the radar reflectivity factor (see Section 2.1), which leave out considerable parts of the cells' cumulus and dissipation stage, the lifetime of a KONRAD cell can typically be considered to be shorter than the lifetime of the real convective cell. Among the 10 KONRAD cells with the longest lifetimes (up to more than 240 min) were several prominent supercells whose rotation was confirmed by eyewitnesses. Other supercells are not included owing to the filter criteria mentioned in Section 2.1, like the supercell on September 11, 2011 (Fluck, 2018), or the supercells on July 27-28, 2013 .
A large fraction of the cells (that increases with storm lifetime) moved from (south)westerly to (north)easterly directions. The overall direction distribution (not shown) is well in line with other radar-based tracking studies for Germany (e.g., Wapler and James, 2015;Schmidberger, 2018) and is a consequence of the favourable conditions for convection, when moist and warm air masses from the southwestern European and western Mediterranean regions are advected to central Europe (Kapsch et al., 2012;Piper and Kunz, 2017;Mohr et al., 2019). The propagation speed c of the cells ranges from close to zero to values above 25 m⋅s −1 , mirroring different dynamic environments ( Figure 2a). The initial cell growth is highly variable. However, the life-cycle analyses in Appendix A.1 indicate that the cell area at the first few life-cycle stages (e.g., the cell area detected 5 min after the first detection A t=7 min ; Figure 2a) might serve as a good predictor for the cell lifetime and the maximum cell area during the life cycle. A last interesting property of the cells in the dataset is that during the life cycles of long-living cells (T > 60 min), many other cells are present across the country ( Figure 2b). These simultaneous cells, which were each detected at least at one detection time of the respective long-living cells, were mostly short-lived due to the general majority of the short-living cells. Several of them occurred in similar ambient conditions in the wider vicinity of the long-living cells (i.e., their track centres were closer than 100 km to the long-living cells' track centre, but the cells did not interact), so that around 80% of the long-living cells were accompanied by at least one short-living cell each, and around 50% were accompanied by at least three short-living cells each.

Characteristics of ambient variables
The analyses presented in this section investigate the properties of the ambient variables prevailing during the occurrence of the detected convective cells but without any relation to life-cycle attributes, such as lifetime or maximum cell area. As described in Section 2.3, the subsequent evaluations focus on the 33 most relevant ambient variables. Their values reflect mean values averaged over the respective cell lifetime, as the analyses revealed that their variations along the cell track are in many cases negligibly small (see later), especially for short-living cells, which constitute most of the event set. The ambient variables comprise several parameters characterising ambient dynamics (wind and vertical wind shear), thermodynamical parameters representing convective stability, moisture and temperature quantities, characteristic levels from parcel theory (e.g, Bjerknes, 1938;Holton, 2004) and composite parameters consisting of a combination of several quantities. A complete list of the 33 variables, including their long and short names, is given in Appendix A.2 (Table A1).
The frequency distributions of seven of the 33 variables mirror the range of atmospheric conditions in which the 38,553 storms occurred ( Figure 3a). Only parameters frequently used for the description of thunderstorm environments in the ingredients-based forecasting (see Section 1) are discussed: the most unstable (MU) convective available potential energy CAPE MU , the 700-500 hPa lapse rate LR 700-500 , the mixed-layer (ML; lowest 100 hPa) lifted index LI ML , the 0 • C-level height h 0 • C , the IWV, the deep-layer shear DLS, and the 0-3 km storm relative helicity SRH 0-3 calculated using the parametrisation of Bunkers et al. (2000) for right-movers. About 74% of the cells occurred in thermodynamical conditions with CAPE MU < 500 J⋅kg −1 , corresponding to a theoretical maximum updraught velocity of W MAX = (2CAPE MU ) 1∕2 ≈ 32 m⋅s −1 (e.g., Markowski and Richardson, 2010). Only 7% of the cells evolved during high CAPE MU with values above 1,000 J⋅kg −1 . High CAPE values are climatologically observed on fewer days or hours per year in central Europe compared with the United States (Brooks et al., 2003;Taszarek et al., 2020). The LR 700-500 , a differential measure of midtropospheric convective instability, ranges mostly between typical values of 5.5 and 6.5 K⋅km −1 (Westermayer et al., 2017), whereas larger values are represented sparsely. The values of LI ML , another measure of convective instability, mostly range from −4 K (unstable) to +2 K (stable). A positive LI ML does not necessarily imply a stable stratification, as an air parcel starting to rise from a higher or more unstable level than the averaged ML may still have experienced positive buoyancy. LI values below −5 K usually come with very high CAPE values of more than 1,000 J⋅kg −1 (Westermayer et al., 2017) and are only sparsely represented in the dataset. As most of the 38,553 cells have a short lifetime (see Section 3.1), the temporal variability of the ambient variables during the life cycle is rather small compared with the general variability between the different storm environments (in terms of standard deviation). Nevertheless, this ratio of life-cycle variability to environment variability is occasionally >1 for long-lasting cells, especially for thermodynamical quantities, such as CAPE and LI, which typically vary on comparatively small scales. In addition, the ratio is sensitive to temperature and moisture variations in the near-surface tropospheric layers (Lee, 2002;Miller and Mote, 2018).
About 19% of the cells experienced a rather strong vertical wind shear, with DLS values being higher than 18 m⋅s −1 , which are favourable for potential supercell developments (Markowski and Richardson, 2010). Most of the cells occurred in rather calm to moderate dynamical conditions with DLS values below 18 m⋅s −1 , a range where extreme convection-related hazards are only occasionally observed (except for heavy rain from stationary storms, which are especially favoured by such conditions; Púčik et al., 2015;Aregger, 2021). The SRH 0-3 is mostly in the range 0-150 m 2 ⋅s −2 (Taszarek et al., 2020). In conclusion, the bulk of the cells analysed represents single cells in weak to moderate dynamical and thermodynamical conditions (Figure 3b), which can be partly explained by the application of the cell filters described in Section 2.1. Especially in warm summer air masses (850 hPa temperature T 850 higher than 10-15 • C), the IWV reaches high values mostly ranging from 25 kg⋅m −2 to more than 40 kg⋅m −2 as a consequence of the Clausius-Clapeyron relation ( Figure 3c). These are remarkable values that can lead to heavy precipitation and flooding (e.g., Wilhelm et al., 2021). Regarding the spatial variability of the ambient conditions, only several thermodynamical quantities vary on the meso-scale (Orlanski, 1975) conspicuously, indicating a northwest-to-southeast gradient (e.g., the average LI ML decreases, and T 850 and h 0 • C remarkably increase from northwest to southeast), which mirrors the general mean air mass distribution over Germany during summertime (not shown).

Correlation and cluster analysis
The (anti-)correlation in terms of Spearman's rank correlation coefficient r S between either two dynamical or two thermodynamical quantities is high and significant for many combinations (Figure 4; Manzato, 2012;Ukkonen and Mäkelä, 2019). The agglomerations of correlation coefficients apparent in Figure 4 can be further interpreted by non-hierarchical correlation clustering, an objective multivariate correlation approach. The k-medoids clustering, akin to the extensively used k-means clustering, is able to find clusters of correlated variables based on the dissimilation matrix alone without knowing the space dimensionality and the positions of the variables in this space (MacQueen, 1967;Lloyd, 1982;Kaufman and Rousseeuw, 1990). The corresponding algorithm is called "partitioning around medoids", which converges with arbitrary dissimilation metrics. Here, the measure d = 1 − |r S | serves as a dissimilation metric: d is small for strong bivariate (anti-)correlations and is close to 1 for weak correlations ( Van der Laan et al., 2003). The quality of the clustering is assessed by means of silhouette coefficients, with S → 1 representing strong structuring and S < 0 indicating the need for the improvement of the clustering (Rousseeuw, 1987). The highest total silhouette coefficient with S = 0.42 is reached for N C = 4 clusters ( Figure 5). However, clustering The dissimilation metric is d = 1 − |r S |. The projection of the clusters onto the first two principal axes of the high-dimensional eigenspace after multidimensional scaling is shown. The first principal axis explains 46.5% of the observed variability, and the second principal axis explains 17.7% of the observed variability.
into three to seven clusters yields very similar values (e.g., S = 0.37 for N C = 7). The clustering with N C = 4 reveals one cell (core) area cluster, one dynamical cluster at positive values of the first principal axis (mirroring the top left ambient variable agglomeration in Figure 4 including the cell propagation speed), and two clusters consisting of thermodynamical and moisture quantities. Note that analogous clustering without cell attributes yields the same constitutions of the three ambient variable clusters. The medoids -that is, the variables whose average dissimilarity to all variables in the four respective clusters is minimal (they can thus be considered to be their respective nuclei) -are A C,t=7 min , the midtropospheric mean wind between 3 and 6 km above ground level (U 3-6 ), the vertical totals index (VT), and the ML deep convective index (DCI ML ). Thereby, the clusters represent the cell (core) area attributes, the midtropospheric flow, differential convective instability in the middle troposphere as well as a collection of air mass temperature, moisture, and further convective instability indices. Note that the medoids do not necessarily have to look centred in the clusters in Figure 5, as only the projection onto the first two principal axes of the high-dimensional eigenspace after multidimensional scaling (Pison et al., 1999) is depicted. The three ambient variable clusters are reminders of the best discriminator between the high-shear low-CAPE severe and non-severe convective events of Sherburn and Parker (2014): the severe hazards in environments with reduced buoyancy (SHERB) parameter, which multiplicatively consists of a wind (shear) parameter, a midlevel lapse rate, and a low-level lapse rate.
Within the thermodynamical cluster, for example, the anti-correlation between the integral and differential stability measures CAPE MU and LI ML , as adumbrated by analyses from Westermayer et al. (2017), is strong with r S = −0.85. Additionally, the principal component analysis between W MAX and LI ML reveals that the first principal component explains more than 90% of the total variance (not shown). This is a stronger anti-correlation as found by Manzato (2012) and Mohr and Kunz (2013) between different CAPE definitions and the LI. The differences presumably emerge from the fact that the latter studies are based on parameters stemming from proximity soundings (i.e., real radiosonde measurements), which are compared with parameters derived from analysis data of an imperfect convection-parametrising model here. Nevertheless, the model-derived parameters describe the reality sufficiently well, and have the advantage that they represent the conditions in an environment and at a time instance, which are close to the observed storm occurrence (see Section 2.3).
Within the dynamical cluster, DLS is moderately correlated with SRH 0-3 (r S = 0.62). A very strong correlation exists between DLS and the midtropospheric mean wind U 3-6 (r S = 0.87), whereas its correlation with the lower tropospheric mean wind between 0 and 3 km U 0-3 is remarkably smaller (r S = 0.45). As the correlation of DLS and the midtropospheric wind at the 500 hPa level U 500 (a variable not used for further evaluation and discussion) is even higher with r S = 0.91, it can be concluded that DLS seems to be mainly determined by the absolute value of the midtropospheric wind. All conclusions in the following sections that are drawn based on DLS could therefore be virtually attributed to the midtropospheric wind. The cell propagation speed A(t = 7 min) exhibits a higher correlation with the vertically averaged mean wind (e.g., r S = 0.70 with U 0-6 ) than with the vertical shear (e.g., r S = 0.55 with DLS). The supercell composite parameter (SCP; Thompson et al., 2003), which is multiplicatively composed of CAPE MU , DLS, and SRH 0-3 , and which is calculated as in the formulation of Gensini and Tippett (2019), is more strongly correlated to dynamical than to thermodynamical parameters. The same applies for the spatial standard deviation of the MU bulk Richardson number BRN (sd) MU , which represents the variability of the ratio between potential and kinetic energy in the cells' surroundings (Markowski and Richardson, 2010). In contrast, the significant hail parameter (SHIP; NOAA SPC, 2014), which consists of CAPE MU , DLS, the water vapour mixing ratio, and the midtropospheric lapse rate and temperature, as used, for instance, in Prein and Holland (2018), Czernecki et al. (2019), and Tang et al. (2019), correlates more with thermodynamical quantities.
Within the cell (core) area cluster, it is apparent that cell core area A C dominates the ratio A C ∕A, which can be largely attributed to the fact that A C,t=7 min has mostly small values (e.g., 0 km 2 for approximately 60% of the cells). Interestingly, the correlations of all variables with A t=7 min are weak (|r S | ≤ 0.1) except for its correlation with SHIP, which is slightly higher. The strongest cross-cluster correlation is r S = −0.52 for IWV and LR 700-500 , which is even stronger than the correlation is between IWV and the relative humidity at the 700 hPa level RH 700 . Moreover, the latter is the ambient variable with the most weak correlations (|r S | ≤ 0.1) with other ambient variables. Further cross-cluster correlations with |r S | > 0.4 comprise the thermodynamical combinations of LR 700-500 with h 0 • C or e,850 , as well as VT with LI ML or Showalter index SI. A low correlation exists, however, between DLS and LI ML (r S = 0.19), mirroring the plausible weak connection between the vertical wind shear and convective instability. The highest (anti-)correlations between a dynamical and a thermodynamical variable can be reported for the combinations of the medium-layer shear (MLS) with T (sd)
Incorporating more than the chosen 33 ambient variables into the clustering identifies that further midtropospheric thermodynamical and moisture variables can be related to the cluster at large positive values of the second principal axis. For example, building seven clusters splits the dynamical cluster into an SRH cluster and the remaining quantities, whereas from the thermodynamical cluster at negative values of the first principal axis, two smaller clusters are separated. Building even more clusters separates A from A C and A C ∕A and increasingly produces further single-variable clusters, which should be avoided. It is difficult to determine which cluster number between 3 and 7 is thus most appropriate. For the study at hand, N C = 4 is chosen for further evaluations. In conclusion, the results from the clustering procedure enable a fast and concise overview of the correlations and the correlation-based distance, respectively, between a variable and many others (compared with the correlation matrix in Figure 4). Moreover, it can serve as an objective decision basis to omit redundant information and select suitable combinations of rather independent ambient variables for multivariate analyses and forecasts of convective cell evolution.

STATISTICAL RELEVANCE OF AMBIENT CONDITIONS FOR CELL EVOLUTION
The analyses presented in this section examine the life-cycle attributes lifetime T and the maximum cell area A max and their relations with the prevailing ambient variables, as well as with the cell attributes cell area A, core area A C , their ratio A C ∕A, and propagation speed c (at specific cell ages), in the following referred to as predictors. As in Section 3, the evaluations consider the 33 most relevant ambient variables (see Appendix A.2, Table A1). The univariate analyses in Section 4.1 comprise not only investigations of the discrimination skill between short-and long-living cells for the set of predictors (Section 4.1.1) but also a similar examination of the discrimination between small and large cells with respect to their maximum cell area during the life cycle (Section 4.1.2). In addition, Section 4.1.3 presents a simple mathematical model for the description of the evolution of the cell area, considering information about an ambient variable for the example of LI ML . Bivariate analyses demonstrating quantitatively the statistical connections between cell attributes and the combinations of two predictors complete this section (Section 4.2).

Discrimination of cell lifetime classes
First, the predictors are evaluated with regard to their ability to distinguish between short-and long-living cells. The lifetime separator for the following discussion is = 60 min, separating the data into 37,457 short-living (i.e., T ≤ 60 min) and 1,096 long-living (T > 60 min) cells. Generally, the separator value may be chosen arbitrarily, resulting in potentially different outcomes. Thus, the discriminator just defined is one possibility focusing on the peculiarities of the ambient conditions of the 2.8% longest-living cells, whose lifetime is well above the typical values of usually not very severe single cells.
Following the methodology of Czernecki et al. (2019), an estimation of the distribution function (probability density function, PDF) of these two classes is made via the kernel density estimation according to Parzen (1962) with a Gaussian kernel and 100 interpolation points. Moreover, several scores based on categorical verification (e.g., Mason, 1982;Wilks, 2006) are considered for the evaluation of the predictor skills. These include the probability of detection POD and the probability of false detection POFD (also known as the hit and false alarm rate), the Peirce skill score PSS = POD − POFD, and the parameter d ′ , which indicates the separation of two distributions in terms of the number of standard deviations that the means of the two distributions are apart (assuming normal distributions with equal standard deviations; higher values of d ′ indicate an easier discrimination; Brooks and Correia, 2018). These TA B L E 1 Contingency table explaining hits, false alarms, misses, and correct rejections for the analysis of lifetime discrimination, using the example of the deep-layer shear (DLS) as predictor.

Long-living Short-living
Above DLS threshold Hit False alarm Below DLS threshold Miss Correct rejection scores can be depicted in the receiver operating characteristic (ROC) diagram. Moreover, the POD, the success ratio SR = 1 − FAR (where FAR is the false alarm ratio), the critical success index (CSI), and the bias B are used to describe the predictive skill in performance diagrams. For example, the fraction of long-living cells associated with a high DLS above a certain DLS threshold (hits) defines POD, whereas the fraction of short-living cells associated with a high DLS above this threshold (false alarms) defines POFD (see Table 1). The FAR characterises the reliability by giving the percentage of false alarms compared with all (short-and long-living) cells above the threshold. CSI relates the number of hits to all cells that are above the DLS threshold or long-living cells (i.e., all but the correct rejections). The bias B relates the number of forecasted long-living cells to the number of observed long-living cells. For variables with decreasing values for a longer lifetime (e.g., LI ML ), the scores are defined with respect to the reversed threshold behaviour (i.e., "above" and "below" in Table 1 are swapped). The optimal thresholds of the predictors are tested iteratively and determined such that the PSS is maximised for the ROC diagram (Manzato, 2007), and the CSI is maximised for the performance diagram. The values of the other scores in the respective diagrams relate to that same optimised threshold. The score values depend on the lifetime separator as alluded to earlier herein (Wilhelm, 2022). For example, PSS values generally increase for increasing values of , mainly due to an increase in POD. CSI values generally decrease for increasing due to the increasing number of cells that are defined as short-living and are not correctly assigned. Note that such a categorical evaluation forms a hard decision boundary, where, for example, cells with a lifetime of 57 min, which are forecasted to be long-living, are rated as wrongly forecasted. The detailed PDFs from the kernel density estimation of three exemplary predictors A t=7 min , DLS, and LI ML , separately for short-(T ≤ ) and long-living cells (T > ), illustrate that the higher the PSS is the lower is the resulting overlap of the PDFs (see Figure 6a-c). The PDFs contain all 38,553 cells with a minimum lifetime of 7 min. In general, the optimal variable threshold yielding the highest PSS value differs from the one for the highest CSI value.
However, the overlap between the distributions for all variables is remarkably high and PSS and CSI values low, indicating only a weak discrimination and prediction skill. This will be discussed in detail in the next paragraph by means of the ROC and performance diagrams, where the different score metrics are summarised for the best variables of the four respective clusters from Figure 5 (i.e., only the results for one variable per cluster maximising PSS and CSI are shown for the sake of clarity). The dependence of the scores on the minimum cell age (i.e., on the time point when the lifetime estimation is made) will also be investigated. Individual score values and variable thresholds for all predictors are listed in the Supporting Information (Tables S1 and S2).
The ROC and performance diagram clearly show that the discrimination and prediction skill of all clusters is rather low (Figure 7). However, the (blue) cell (core) cluster achieves the best scores, followed by the (green) dynamical cluster. The (red) thermodynamical and the (purple) midtropospheric clusters show very little skill. The best predictor in terms of PSS (ranging from 0.28 to 0.38 depending on the minimum cell age) and CSI (ranging from 0.07 to 0.38) is cell area A (see Supporting Information Tables S1 and S2), represented by the markers of the cell (core) cluster. Within the dynamical cluster, several variables, such as DLS, SCP, MLS, and U 3-6 , reach similar score values, whereas cell propagation speed c shows slightly less skill. None of the ambient variables exceeds PSS = 0.18. Owing to the majority of cells with short lifetimes, the CSI and SR values are generally rather small. This is especially true for high lifetime separator values, like = 60 min, because many more wrong assignments of the short-living cells exist in absolute numbers compared with the correct assignments of the 1,096 long-living cells. Generally, the scores indicate only a fair discrimination and a low reliability for deriving the cell lifetime from a specific predictor during the first 37 min of the life cycle. However, CSI and SR increase with increasing cell age (e.g., for cell area A from CSI = 0.07 for predictions at t = 7 min, to CSI = 0.38 for predictions at t = 37 min), indicating a slightly more reliable estimation when the forecast time point approaches lifetime separator = 60 min. Only the predictors from the cell (core) cluster reach CSI values that are by around 0.05-0.10 higher than those from a reference forecast, which always predicts long-living cells (Supporting Information Table S2). Thus, the gain in forecast performance is only little, when information about a single ambient variable is considered. Interestingly, the discrimination skill of the ambient variables in terms of PSS decreases slightly with increasing cell age (as does the bias B), whereas the skill for A (and A C ) increases. Hence, a strong initial cell growth and intensification seems to have positive effects on the cell lifetime (see Appendix A.1). Moreover, this finding suggests that, with increasing cell age, information about the cell history becomes more important compared with the ambient variables.
The optimal variable thresholds in terms of PSS, as exemplarily shown for A t=7 min , DLS, and LI ML in Figure 6a-c, mostly do not change much with increasing cell age (not shown) due to the minority of the 1,096 long-living cells in the statistics (i.e., the majority of the short-living cells dominate the threshold determination). The thresholds for cell area A range from 26 to 30 km 2 , and the thresholds for dynamical variables represent moderate wind speed and shear conditions (e.g., DLS and U 3-6 are around 12 m⋅s −1 ), whereas the thermodynamical indices VT, KO index, SI, and LI ML lie in the range of slightly unstable stratification (e.g., LI ML and SI are around −1 K). The optimal variable thresholds in terms of CSI, also shown in Figure 6a-c for a few examples, change towards values that are less favourable for a long lifetime with increasing minimum cell age (e.g., thresholds for A, DLS, and U 3-6 decrease and for LI ML and SI increase). The better performance of the dynamical variables compared with the other two ambient variable clusters suggests that they have the best discrimination skill with respect to the lifetime due to their influence on the degree of cell organisation. This finding is in line with theoretical considerations in classical textbooks (e.g., Markowski and Richardson, 2010;Trapp, 2013) and corresponds to a higher probability of severe weather occurring in dynamically active environments (Taszarek et al., 2020), which is mostly attributed to long-living convective storms. Interestingly, this result contrasts the recent finding of Zöbisch et al. (2020), which found no connection between DLS and cell lifetime for a set of satellite-based thunderstorm detections over central Europe. They mentioned, however, that this might originate from the fact that they deliberately did their analysis without a filtering for complete undisturbed cell life cycles in order to represent the full convective spectrum.

Discrimination of cell area classes
For the investigation of the discrimination skill of the predictors with regard to classes of the maximum cell area, a similar split of the data as for the cell lifetime in Section 4.1.1 is performed by separating the 1,052 largest cells (2.7%), which reached maximum cell area A max of more than = 60 km 2 . For the prediction times 17 min, 27 min, and 37 min, only 1,043, 979, and 867 large cells respectively remain, as a few large cells had a short lifetime. As for the lifetime discrimination, individual score values and variable thresholds for all predictors are listed in the Supporting Information (Tables S3 and S4).
From Figure 8a, it can be seen that here, the cell (core) cluster shows a high discrimination skill (PSS ranges between 0.49 and 0.59), much higher than all ambient variables, resulting in less overlap of the corresponding PDFs (see examples in Figure 6d-f). The threshold of A that best discriminates the maximum cell area is around A(t) = 30-38 km 2 for most prediction time points (see Figure 6d; Supporting Information Table S3). CSI and SR reach higher values than for the lifetime estimation but saturate after t = 17 min (i.e., the predictions after 27 and 37 min are not improving much; Figure 8b). The bias B is close to 1 for all prediction times. Like the cell area, the core area A C and the ratio A C ∕A reach better score values than all ambient variables do (Supporting Information  Tables S3 and S4). In summary, these results indicate that a cell should grow strongly already during the first life-cycle stages to reach a large area during its life cycle as also Figure A1 in Appendix A.1 suggests.
In contrast to lifetime discrimination (see Section 4.1.1), when it comes to distinguishing small and large cells, thermodynamical indices reach PSS values comparable to the dynamical variables (Figure 8a; Supporting Information Table S3). Higher instability is conducive for free buoyancy-driven convection, which favours the growth of convective cells. In particular, the combined parameters SCP and SHIP, consisting of both thermodynamical and dynamical quantities, show the best discrimination but with relatively small variable thresholds below 0.25. Such values are observed very often in thunderstorm environments. PSS hardly exceeds 0.20, B is somewhat closer to 1 for all clusters, and CSI reaches values comparable to the lifetime discrimination (which are again very small when compared with the reference forecast, which always predicts cells with a large maximum cell area; Figure 8b; Supporting Information Table S4). The optimal variable thresholds for DLS and LI ML do only differ slightly from the ones for the lifetime discrimination (Figure 6e,f; Supporting Information Tables S3 and S4).
Thus, compared with the skill for lifetime discrimination, the cell attributes are of higher relative F I G U R E 8 As Figure 7, but for the maximum cell area separation with = 60 km 2 . The best predictors of the respective clusters can be read from Supporting Information Tables  S3 and S4 (underlined, bold score  values). POD, probability of detection; POFD, probability of false detection; SR, success ratio.
importance than ambient variables are, which only show a weak statistical relationship with both targets lifetime T and maximum cell area A max . Nevertheless, most scores indicate a somewhat higher skill for the discrimination of small and large cells for all ambient variable clusters. This is reversed for the dynamical cluster, however, when more extreme lifetime and cell area groups are chosen; for example, with = 100 min and = 80 km 2 (taking into account the equal division of the groups of lifetime and maximum area; not shown). In that case, the separate results for the lifetime and maximum cell area discrimination do not change much qualitatively, and the findings for the predictors can be interpreted similarly to the explanations earlier herein.

4.1.3
A simple life-cycle model with one ambient variable Despite the weak statistical relationships of ambient variables and the targets cell lifetime T and maximum cell area A max discussed earlier herein, a potentially useful approach for the integration of an ambient variable in a simple life-cycle model is presented in this section. As shown in Wapler (2021), the temporal evolution of the mean cell area A can suitably be approximated by a parabola opened downwards. In Appendix A.1, their findings are briefly recapped, and a mathematical parabola formulation is introduced, which the following derivations build upon.
The mathematical parabola model introduced in Equation (A1) (see Appendix A.1) describing this evolution can be refined by adding an ambient variable as second form parameter u. The model then reads as follows: (1) Minimum cell area A (T,u) min can conveniently be assumed to be independent of T and u: A (T,u) min = A 0 . The mean wind and vertical wind shear expressed by the MLS, DLS, or one of the SRHs exhibit no clear monotonic statistical relationship to the maximum amplitude A (T,u) max − A (T,u) min ≡  (T,u) , whereas thermodynamical variables like, for example, LI ML , SI, KO index, or LR 700-500 , can be incorporated via  (T,u)

= c A (u)T using a linear assumption for c A (u). Equation (A2) is then transformed into
. (2) A linear regression for LI ML (in kelvin) yields c A (LI ML ) ≈ (0.351 − 0.020LI ML ) km 2 ⋅min −1 with a root-mean-squared error of RMSE = 0.03 km 2 ⋅min −1 , which leads to differences compared with the parabola model without LI ML information of (10 km 2 ) and more for the same lifetime T. Equation (2) leads to an estimate of the maximum cell area, which is given by and also valid without u-dependence. As an example (called Example 1) without an ambient variable, for T = 60 min, a maximum area of A (60 min) max ≈ 39 km 2 is obtained (see Figure A2a). With dependence on LI ML , one finds with Equation (3) considering the uncertainty in the calculation of the regression coefficient c A (LI ML ) ± 0.03 km 2 ⋅min −1 that there are differences worth mentioning (Figure 9a): the mean maximum area is around 10 km 2 larger for high instability (low LI ML values) compared with stable conditions (high LI ML values) for the same lifetime T. This difference is larger than the uncertainty arising from the calculation of c A (LI ML ). Moreover, it becomes clear that the original parabola model from Equation (A2) is equivalent to the parabola model from Equation (2)  2.8 K, indicating that the maximum cell area is underestimated by the original parabola model in neutral and unstable conditions. Conversely, solving Equation (2) for T yields estimates for the lifetime T as a function of LI ML , given the cell area A(t) at a specific cell age t: . (4) As a second example (called Example 2), one clearly sees for a given cell area A(t) = 38 km 2 at t = 17 min (corresponding to the optimal discrimination threshold regarding the maximum cell area A max ; see Section 4.1.2) that the lower the LI ML is (i.e., the more unstable the environment is) the shorter is the remaining lifetime and the smaller is the maximum cell area (Figure 9b,c). The interpretation is as follows. For LI ML values in the unstable range of less than 0 K, a cell with A t=17 min = 38 km 2 has not grown strongly for some reason considering the good thermodynamical conditions. It is therefore expected not to intensify further (i.e., to have a short lifetime and a small maximum area). Another interesting aspect is that the uncertainty ranges substantially increase, when the chosen cell area A(t) approaches the theoretical limit of the parabola model A (LI ML ) crit (t) = A 0 + 4c A (LI ML )t (see Appendix A.1). Uncertainties arising from other sources, like the inaccuracy of the definition of the cell area or cell age, caused, for example, by shadowing effects in the radar data or the KONRAD cell definition, are, of course, not included in the presented uncertainty ranges.
Separate statistics of cells occurring in rather unstable (LI ML < −1 K) and rather stable (LI ML ≥ −1 K) conditions depict these differences well ( Figure 10). As mentioned in the preceding discrimination analysis (Sections 4.1.1 and 4.1.2), values around −1 K are discriminating best between long-and short-living cells, as well as between cells with small and large maximum cell areas in terms of PSS (see Supporting Information Tables S1 and S3). The fraction of long-living cells with T > 60 min is 3.4% for unstable conditions and 2.4% for rather stable conditions. The areas of cells occurring in rather unstable conditions grow faster than those for cells occurring in more stable conditions. The lower the LI ML values are the higher is the convective instability, and thus the possibility for rapid growth through free convection. Nevertheless, large overlapping areas remain between the curves of different lifetimes as well as between the high-LI and low-LI curves, representing the large variability of individual cell life cycles. The limited sample size makes the incorporation of more than one ambient variable less confident, as the representativeness of the multivariate regression in the calculation of c A degrades with higher dimensionality. This deficiency might shrink for longer study periods.

Bivariate analyses
Separating the dataset into classes of short-and long-living or small and large cells, as already described in Sections 4.1.1 and 4.1.2, and comparing their PDFs as a function of two ambient variables reveal their combined statistical connections. As in Figure 6, the findings for the cell area A t=7 min , DLS, and LI ML , which represent three different predictor clusters (see Figure 5), are illustrated as examples (Figure 11). The evaluation scores are again summarised in the ROC and performance diagrams ( Figure 12). The results are not depicted for all possible bivariate predictor combinations, but only for a selection: all bivariate predictor combinations are considered, where

F I G U R E 11
Scatter plot of all 38,553 cells and comparison of two-dimensional probability density functions from kernel density estimation for (a, b) cell separation with = 60 min in short lifetime (blue hues; N = 37,457) and long lifetime (red hues; N = 1,096), and (c, d) cell separation with = 60 km 2 in small maximum cell area (blue hues; N = 37,443) and large maximum cell area (red hues; N = 1,052), as a function of (a, c) the combination of deep-layer shear (DLS) and cell area A t=7 min , as well as (b, d) DLS and mixed-layer lifted index LI ML . The contour lines indicate the respective 0.25 (solid, red/dark blue), 0.75 (solid, orange/light blue) and 0.95 (dashed) frequency levels; for example, 75% of the long-living or large cells are located within the respective solid orange contour. Similar to Figure 6, the black and green lines indicate the optimal thresholds according to the Peirce skill score and critical success index respectively, based on a linear discriminant analysis. the predictors originate from two different clusters. From these predictor combinations, only those are drawn that reach the highest PSS and CSI for their respective specific cluster combination. This is done independently for all four different prediction times. Similar to the univariate procedure described in Section 4.1.1 for the determination of the optimal variable threshold, thresholding lines in the 2D variable spaces have optimally been calculated in an iterative procedure with 100 repetitions, here by means of a linear discriminant analysis.

F I G U R E 12 Similar to
The cell area A t=7 min and DLS as predictors have a combined discrimination skill that is mirrored by a shift in the dense core area of the 2D PDFs from kernel density estimation toward higher cell area and DLS values for the long-living and large cell group respectively (Figure 11a,c). Compared with the univariate score values of A, the combination with DLS leads only to a minor improvement for the lifetime estimation. Depending on the prediction time point, PSS increases only by 0.01 to 0.02, and CSI increases by 0.01 at most (not shown). For the estimation of A max , these metrics are not improved. The combination of A and DLS reaches scores very close to the best variable combinations from the cell (core) cluster and the dynamical cluster.
In general, when comparing Figure 7 with Figure 12a,b and Figure 8 with Figure 12c,d, it is apparent that the bivariate scores are not much better than the univariate scores are. The greatest improvements can be seen for combinations of variables from two ambient variable clusters. As an example, the combination of DLS and LI ML (see also Figure 11b,d) is in many cases the best ambient variable combination (or very close to it): for the lifetime estimation, the PSS increases by up to 0.03, and the CSI increases by up to 0.01. For the maximum cell area estimation, the highest increases are by 0.10 for the PSS and by 0.04 for the CSI. However, combinations of any ambient variable cluster with the cell (core) cluster achieve the highest skill. Combinations of two clusters seem to improve the biases in many cases compared with the univariate values. As a last note, the optimal thresholding lines for PSS divide the high-frequency regions of the PDFs (areas within the solid blue and red lines in Figure 11) well, whereas the lines for CSI are located at predictor values that are more favourable for a long cell lifetime and a large maximum area (similar to the univariate case; see Figure 6). For these thresholding lines, the number of missed events is as high as or even higher than the number of hits. As could be expected from the previous analyses, the mean lifetime increases with increasing vertical wind shear and instability, as well as with increasing cell and core area after 7 min. The mean lifetime differences between different predictor ranges are higher for predictor combinations with the cell area A (25-30 min; Figure 13a,b) than for combinations of ambient variables solely (10-15 min; Figure 13c; Figure A4). These differences are higher than those for grouping by only one of the ambient variables, leading to a difference of at most 7-10 min (not shown). The standard deviations (Figure 13d-f) are approximately of the same order as the mean lifetime differences, meaning that the latter should be interpreted carefully. The high standard deviations originate, inter alia, from the fact that, on days with convection-favouring conditions, not only do long-living cells develop but also many short-living cells (see Figure 2b).
Very similar results apply qualitatively to the maximum cell area differences and respective standard deviations, albeit the relative differences of the maximum cell area are somewhat higher (Figure 14). The mean maximum area increases from around 20 km 2 for weak initial cell growth to more than 50 km 2 for strong initial growth (Figure 14a). The cell growth is more decisive for the maximum area than DLS is (Figure 14b).
High vertical wind shear, however, combined with convective instability procures a maximum area of around 35 km 2 , whereas cells grow only slightly (up to around 20-25 km 2 ) during stable conditions with little shear (Figure 14c).
When considering only cells that reached a lifetime of at least 30 min, the mean lifetime and maximum area increase for the groups that are conducive for longer lifetimes/larger areas owing to the larger relative frequency of the long-living and large cells in these groups. For example, at high DLS (>15 m⋅s −1 ) and low LI ML (less than −1 K) values, the lifetime is expected to be around 55-60 min, and the maximum area is expected to be around 55-60 km 2 , which is 20-25 min longer and 20-25 km 2 larger than at low DLS and high LI ML values (not shown). Thus, the lifetime and maximum area differences between different predictor groups increase when short-living cells with a lifetime of less than 30 min are not taken into account for the evaluation. This is in line with the increased forecast performance in terms of CSI for later prediction times in the categorical analyses earlier herein.
The investigations of combined dependencies with even more than two predictors hardly show groups with sufficient large sample sizes for establishing robust statistical relations (not shown). A clear relationship could not be extracted due to the existence of too many uncovered subspaces in the multidimensional space. Advanced statistical and machine-learning methods offer ways in which to achieve a potentially better predictive skill in the future by using more than two predictors simultaneously. They also provide the possibility of quantifying the particular importance of the predictors for life-cycle estimations and of addressing the corresponding uncertainties for nowcasting applications.

SUMMARY AND CONCLUSIONS
The life cycles of convective storms in Germany are analysed taking into account the prevailing atmospheric ambient conditions by means of a unique object-based dataset. This dataset combines cell objects derived from DWD's cell detection and tracking algorithm, KONRAD, with high-resolution NWP assimilation analysis fields of COSMO-EU. The focus of the study is on isolated convection, which passed through an undisturbed life cycle without any impact of another convective cell in its vicinity, so cell clusters and several supercells have been filtered out. The general research questions are which multivariate statistical correlations exist between different variables describing the prevailing ambient conditions of convective cells, and which of these variables and which cell attributes exhibit statistical relationships to life-cycle attributes such as storm lifetime T and maximum horizontal extent A max . These analyses provide the basis for further investigations on how the data and findings can be used to develop statistical or machine-learning models that provide life-cycle estimations, with the objective to improve automated multidata real-time nowcasting procedures. Taking up the main research questions posed in Section 1, our conclusions are as follows.
(1) Under which range of prevailing ambient conditions does DMC develop, and how are the related ambient variables statistically correlated with each other and with cell attributes at the beginning of the cells' life cycle?
(a) Most of the isolated convective cells occurred in rather calm to moderate dynamical conditions, associated with some convective instability and moderate to high moisture amounts. (b) A clustering, which bundles highly correlated variables in separate clusters, reveals one dynamical cluster, representing the midtropospheric flow and vertical shear (e.g., U 3-6 , SRH 0-3 , DLS), as well as two clusters consisting of thermodynamical and moisture quantities. Of the latter, the first cluster represents convective instability in the middle troposphere (e.g., LR 700-500 , RH 700 ), whereas the second one consists of a collection of variables describing air mass temperature (e.g., T 850 , h 0 • C ) and moisture (e.g., IWV), and of further convective instability indices (e.g., LI ML , CAPE MU ). (c) When the horizontal cell area A (Z ≥ 46 dBZ) and core area A C (Z ≥ 55 dBZ), their ratio, and the cell propagation speed c, as observed 5 min after the first cell detection, are included in the cluster analysis, the latter joins the dynamical cluster, while the cell (core) area variables form a separate fourth cluster. (d) Correlations between variables within the clusters can be very strong -for example, between CAPE MU and LI ML , and between DLS and U 3-6 -showing that the former is mainly determined by the midtropospheric wind. Cross-cluster correlations are generally mostly weak, but some reach values up to |r S | = 0.52, like the correlation between IWV and LR 700-500 .
(2) Which ambient variables and cell attributes correlate best with the storm properties lifetime and maximum area, indicating the potential for the improvement of nowcasting procedures?
(a) In general, the statistical relevance of ambient variables for the cells' lifetime and maximum area is rather low. (b) The discrimination skill between short-and long-living cells is higher for dynamical variables than for the other two ambient variable clusters, presumably due to their influence on the degree of cell organisation. (c) The discrimination skill between cells with small and large maximum cell area is similar for dynamical and thermodynamical variables, and it is slightly higher than the skill for the cell lifetime discrimination is. (d) The cell and core area 5-35 min after the first cell detection show a (much) higher skill for discriminating between cells with short and long lifetimes or small and large maximum areas. Thus, the cell history is more important than ambient variables are. (e) The highest univariate skill can be reported for the discrimination of the maximum cell area based on the cell (core) cluster (i.e., without any knowledge about the ambient conditions), particularly for the cell area 5-35 min after the first cell detection representing the initial growth of the cells. (f) The bivariate skill for combinations of variables from two different clusters does not increase much, when the cell (core) area is combined with an ambient variable, but it is the highest among all combinations. The mean lifetime is 25-30 min higher, and the mean maximum area is more than 30 km 2 larger for high wind shear and a large cell area 5 min after the first cell detection than for weak wind shear and a small cell area. (g) The bivariate skill for combinations from two ambient variable clusters can be higher than the univariate skill is; for example, for the combination of DLS and LI ML . The mean differences between high-shear low-LI and low-shear high-LI conditions are 10-15 min for the lifetime and 10-15 km 2 for the maximum cell area. The information about the ambient variables makes it possible to estimate the storm lifetime and the maximum area at least with some skill even before cells develop in the respective environments.
As presented by Wapler (2021), an axially symmetric parabola opened downwards describes the mean cell area evolution well. The study at hand revealed that this model has second-order limitations, as the timing of the maximum cell area shifts to later life-cycle stages with increasing lifetime. However, an advantage is that the parabola model can be easily refined by adding one ambient variable as a form parameter. Utilising, for example, LI ML , the analyses show that, for higher instability, the mean parabola curves are somewhat steeper and thus represent a faster growth during the first 15-30 min of the life cycle. Estimates for the maximum cell area and the lifetime with this refined model differ noticeably depending on instability. Still, the individual life cycles show high variability, and considerable uncertainties remain in these estimates stemming from the parameter calculation for the statistical regression model, as well as the inaccuracy of the definition of the cell area or cell age, caused, for example, by shadowing effects in the radar data or the KONRAD cell definition. An alternative, non-parametric approach to the parametric parabola model with a different perspective on cell life cycles, where cell evolution is represented by a flow field in a feature space, seems to be worth investigating in the context of probabilistic nowcasting procedures in the future.
The limitations of the lifetime or cell area discrimination and the corresponding differences of the mean values for specific ambient conditions, described in the last list item, may have several reasons. First, even if the mesoscale conditions as identified by model analyses appear very similar, the actual storm development characteristics can look very different depending on the trigger mechanisms and non-resolved local-or microscale processes (e.g., orographic features, low-level convergence, cloud processes). On days with similar widespread convection-favouring conditions, a high number of short-living and small cells may still occur (see Section 3.1), and only a few of these cells may develop into long-living and large cells. Second, as described in Section 2.3, the ambient variables from COSMO-EU were attributed to the KONRAD cells such that single statistical measures or numbers represent the cells' environment conditions. What was not considered in the methods of this study are the mesoscale gradients around the position of the convective cells, which may give more information than the projection of the ambient variables onto the cell objects does. Third, the strict detection criterion and the tracking method of KONRAD cut off considerable parts of the cumulus and the dissipation stage of the cells. Moreover, a cell not fulfilling the detection criterion for one time instance during its real life cycle is split into two individual KONRAD cells. Fourth, only uni-and bivariate relations between the predictors (ambient variables and some cell attributes at the initial stage) and life-cycle properties T and A max were presented. Similar investigations of combined dependencies with even more than two ambient variables, however, hardly reveal combinations with a sufficiently large sample size for establishing robust statistical relations and, at the same time, remarkably stronger signals with regard to the mean lifetime or mean maximum area (Wilhelm, 2022). Fifth, the ambient conditions are characterised by high-resolution model analysis data, which are a very good approximation but do not completely reflect the actual reality. Sixth, owing to the different scales and processes involved, the chaotic behaviour of the atmosphere strongly affects the challenges listed herein.
For studying the potential additional benefit of considering more than two variables, for quantifying their particular importance for life-cycle estimations, and for addressing the corresponding uncertainties, advanced statistical or machine-learning methods have great potential to be successfully applied to the data (Ukkonen and Mäkelä, 2019;Mecikalski et al., 2021). The adaptation of the methods used by Sherburn et al. (2016) or Kunz et al. (2020), taking into account the spatial distribution of ambient variables, may provide more insights into the mesoscale processes involved and unravel more complex relations to storm properties, such as cell lifetime or maximum area. Beyond that, applying (convolutional) neural networks as a machine-learning-based approach might be beneficial for identifying such complex relations and the relative importance of the variable fields (Kamangir et al., 2020;Molina et al., 2021).
In terms of a multisensor/multidata approach (e.g., Nisi et al., 2014;James et al., 2018;Cintineo et al., 2020;Zöbisch et al., 2020), a combination of the existing dataset of this study with further data derived by radar, satellite, or lightning detection measurements could enhance the multidimensional picture of the measurable properties of convective cells. Advances in the cell detection and tracking algorithms as, for example, the currently pre-operational algorithm KONRAD3D of the DWD (Werner, 2020), relying on 3D radar reflectivity, will be able to describe the life cycles of convective cells more realistically and with a large variety of cell attributes. In addition to the basic information gathered via a 2D-algorithm like KONRAD, such advancements will provide information about the vertical structure and the liquid water content of the cells. Furthermore, information obtained with modern dual-polarisation Doppler radar can be used for hydrometeor classification (e.g., Ryzhkov et al., 2005;Kumjian and Ryzhkov, 2008;Josipovic, 2020) or the automatic detection of mesocyclones (e.g., Hengstebeck et al., 2018;Wapler, 2021). Data from lightning detection yields further valuable information about convective storms (e.g., Farnell et al., 2017;Wapler, 2017). This opens up a large space of further possibilities for statistical life-cycle analyses once a sufficiently large sample of 3D cell objects has been generated.  (2021), a mean parabolic evolution of the cell area A for isolated convection, and a high variability between life cycles is apparent ( Figure A1a,b). Based on multiyear KONRAD statistics and case studies of several prominent storms, the DWD implemented a parametric parabola life-cycle model for cell area evolution in combination with an ensemble Kalman filter into the successor KONRAD3D (Feger et al., 2019;Werner, 2020). The parabolic curves for short lifetimes in Figure A1a,b are smoother than those for long lifetimes, as they are based on a larger number of single-cell life cycles. Especially during the first 15 min after the first detection of the cells, the mean growth rates of long-living cells appear to be higher than those of short-living cells with a lifetime below approximately 60 min. However, within a wide range of long-living cells, growth rates are rather comparable. The respective variation ranges strongly overlap, even with the ranges of short-living cells.

AUTHOR CONTRIBUTIONS
As a proxy for convection intensity, the ratio of cell core area A C (Z ≥ 55 dBZ) and entire cell area A (Z ≥ 46 dBZ) indicates the high-reflectivity fraction of a cell ( Figure A1c,d). The mean ratio grows faster for long-living than for short-living cells. This finding, which supports the hints of Davini et al. (2012), underpins that initial rapid and intense cell growth (i.e., a rapid increase in both the cell area and the core area) can be a good indication of a long lifetime. The reason for this could be a previous rapid intensification of the cell's updraught, which promotes both vertical growth and horizontal extension of the deep convection cell. This may lead to precipitation formation in a large air volume, which is reflected in high reflectivity values shortly thereafter. Such a rapid cell development was observed, for example, for the supercell of July 28, 2013 .
Similar to the approach of Weusthoff and Hauf (2008), the parabola family of cell area can be described by Here, A again represents the mean cell area, T the lifetime (form parameter), and t cell age. The difference A (T) max − A (T) min can be called the maximum evolution amplitude  of the mean cell areas. As can be estimated from Figure A1a, mean maximum amplitude  and lifetime T are linearly highly correlated (r Pearson = 0.74, r S = 0.73) so that  (T) ≈ c A T. Linear regression without intercept yields c A = 0.295 km 2 ⋅min −1 , with an RMSE of the amplitude of 4.2 km 2 . The minimum of cell area A (T) min is assumed to be taken at the first cell detection, showing similar values for the entire cell spectrum. Thus, a constant fit leads to A (T) min ≈ A 0 = 21.3 ± 1.1 km 2 (RMSE). Hence, Equation (A1) may be rewritten as limiting the values of the parabola family to A crit (t) = A 0 + 4c A t (black dashed line in Figure A2a). Normalising the cell amplitudes A (T) (t) − A 0 by the maximum amplitude  and the time t by the cells' lifetime T reduces the parabola family to one single representative parabola describing the mean amplitude-normalised cell area evolution during a normalised life cycle (black dashed line in Figure A2b). As can be seen from the observations in Figure A2b, and as Davini et al. (2012) reported for cells in northern Italy, with increasing (absolute) lifetime, the (relative) time when the maximum cell area is reached is shifted to later life-cycle stages, an asymmetry, which is not reflected in the parabola model. One possible explanation could be that cells with a particularly intense and broad updraught, and thus a large vertical extent, achieve a long lifetime, as they increasingly expand horizontally during the life cycle. These cells possibly reach the largest cell area after the time of the highest maximum storm intensity (in terms of high reflectivity area). The somewhat faster relative decrease in the cell area towards the end of the life cycle could be attributed to a widespread weakening of precipitation formation in regions far from the residual cell core, which would lead to a reflectivity decrease to values below KONRAD's minimum detection criterion of Z = 46 dBZ there. In conclusion, an axially symmetric parabola describes the mean cell area evolution well but with second-order limitations concerning the timing of the maximum cell area during the life cycle.

A.2 Additional table and figures F I G U R E A3
Combined illustration of deep-layer shear (m⋅s −1 ), as calculated from COSMO-EU assimilation analyses, and the KONRAD track of a long-living thunderstorm cell in central Germany on September 11, 2011 (1500 UTC). The KONRAD cell is depicted for all following 5-min detections by light blue rectangles, which enframe all radar pixels belonging to the respective detection. For the first cell detection at 1500 UTC (green rectangle), the cell surrounding is drawn as the outer black circle, obtained by first drawing the inner circle enclosing the rectangle minimally and then adding the fixed radius R fix .