Potential of stochastic methods for improving convection-permitting ensemble forecasts of extreme events over the western Mediterranean

The design of convection-permitting ensemble prediction systems capable of producing accurate forecasts of disruptive events is an extraordinarily challenging effort. The difficulties associated with the detection of extreme events found at these scales motivates the research of methodologies that efficiently sample relevant un- certainties. This study investigates the potential of multiple techniques to account for model uncertainty. The performance of various stochastic schemes is assessed for an exceptional heavy precipitation episode which occurred in eastern Spain. In particular, the stochastic strategies are compared to a multiphysics approach in terms of both spread and skill. The analyzed techniques include stochastic parameterization perturbation ten- dency and perturbations to influential parameters within the microphysics scheme. The introduction of stochastic perturbations to the microphysics processes results in a larger ensemble spread throughout the entire simulation. Conversely, modifications to microphysics parameters generate small-scale perturbations that rapidly grow over areas with high convective instability, in contrast to the other methods, which produce more wide- spread perturbations. A conclusion of specific interest for the western Mediterranean, where deep moist convection and local orography play an important role, is that stochastic methods are shown to outperform a multiphysics-based ensemble for this case, indicating the potential positive impact of stochastic parameterizations for the forecast of extreme events in the region.


Introduction
Since the pioneering works of Lorenz in the 1960s, it is extensively recognized that numerical weather forecasts are uncertain, and that small differences amplify with time, eventually rendering a forecast useless (Lorenz, 1963(Lorenz, , 1969. Although the theoretical frameworks of Liouville and Fokker-Planck equations (Hasselmann, 1976;Ehrendorfer, 1994) provide a basis to describe the time evolution of uncertain systems, their practical application in operational weather forecasting is unachievable and thus, currently restricted to low complexity systems (Hermoso et al., 2020a). In this context, ensemble prediction systems (EPS) emerge as a feasible alternative tool to account for the inherent uncertainties in numerical weather prediction, specifically initial condition and model errors.
Regarding initial conditions, different methodologies are used for the global and synoptic scales with the aim of identifying fast growing modes and optimizing the sampling of plausible states under the everpresent limitation of computational resources. Popular approaches include the singular vectors technique (Molteni et al., 1996), used at the European Centre for Medium Range Weather Forecasts (ECMWF), and bred vectors (Toth and Kalnay, 1993), which have been used at the National Center for Environmental Prediction (NCEP). Alternatively, ensemble data assimilation is an ensemble generation strategy designed to estimate the flow-dependent uncertainty of the forecast, accounting for and propagating uncertain information originating from observations and background fields. Multiple operational centers exploit this approach to initialize global scale ensembles (Whitaker et al., 2008;Houtekamer et al., 2009;Kazumori, 2014, among others). Although error sampling methodologies for large scales are mature (e.g., Buizza et al., 2005;Buizza and Leutbecher, 2015), the translation of these techniques to smaller scales and limited area configurations is not straightforward. Indeed, not only initial, but also boundary condition uncertainty must be sampled owing to the use of limited area models. A prevailing strategy to generate initial and boundary condition perturbations for regional ensembles directed at representing large scale uncertainty consists of the dynamical downscaling of a coarser-resolution global ensemble (e.g., Stensrud et al., 1999;Marsigli et al., 2014;Kühnlein et al., 2014). Furthermore, the adaptation to regional scales of techniques that dynamically sample growing modes, mainly breeding, has been attempted (Thanh et al., 2016;Hermoso et al., 2020b). Additionally, the use of 3D variational perturbations to generate small-scale perturbations blended with large-scale perturbations from a global ensemble has also been explored (Keresturi et al., 2019). In any case, ensemble data assimilation systems, based on conventional observations (i.e., surface data, radiosondes, buoys, among others), and also remote sensing information, such as radar information (e.g., Wheatley et al., 2014;Wang and Wang, 2017;Degelia et al., 2018;Carrió et al., 2019) or satellite derived products (e.g., Romine et al., 2013;Velden et al., 2017;Zhang et al., 2018;Okamoto et al., 2019), constitute the usual initialization method for meso-and convective-scale EPS. The initialization and evolution of surface variables are also an important error source. Indeed, some operational ensembles introduce perturbations to sea surface temperature and/or soil variables to sample uncertainties associated with these fields (e.g., Tennant and Beare, 2014;Bouttier et al., 2016).
Convection-permitting EPS have been shown to improve over coarser resolution predictions (e.g., Duc et al., 2013;Schellander-Gorgas et al., 2017;Klasa et al., 2018;Gowan et al., 2018). However, convective ensemble forecasts are usually underdispersive, that is, ensemble spread is lower than forecast errors (e.g, Clark et al., 2011;Vié et al., 2011;Schwartz et al., 2014;Romine et al., 2014;Chen et al., 2018). The effective prediction of deep convection is complex due to the nonlinearities that dominate in small-scale processes (Hohenegger and Schär, 2007;Melhauser and Zhang, 2012;Sun and Zhang, 2016). This issue severely restricts the predictability horizon of socially relevant aspects of high-impact phenomena, such as location, timing, and intensity of convection, to extremely short lead times of the order of a few hours (e.g., Cintineo and Stensrud, 2013;Surcel et al., 2015;Zhang et al., 2015Zhang et al., , 2016. This limited predictability is in turn highly casedependent (Frogner et al., 2019), and it is strongly conditioned by the ensemble generation strategy as well as local and case-dependent characteristics. For instance, Bachmann et al. (2019Bachmann et al. ( , 2020 identify an increase in predictability related to the presence of orography and the use of radar data assimilation. The difficulties associated with convective-scale forecasting are particularly severe in the Mediterranean region. The area is frequently affected by heavy precipitation events linked to deep convective activity (Ramis et al., 1998;Homar et al., 2002;Nuissier et al., 2008;Fiori et al., 2014;Faccini et al., 2016;Lorenzo-Lacruz et al., 2019, among others), especially during late summer and autumn (Llasat et al., 2010). The relatively warm sea during this time of the year acts as a heat and moisture source which, in combination with upper-level cold disturbances, generates convective instability that favors the triggering of deep moist convection. Furthermore, the Mediterranean Sea is surrounded by prominent mountain ranges (e.g., Atlas, Pyrenees, Alps) and upland areas located near the coast, which play an important role in the development of convective activity (e.g., Ramis et al., 1998;Ducrocq et al., 2008). Indeed, complex topography poses a supplementary challenge to precipitation forecasting as has been shown in multiple studies over different regions (e.g., Goger et al., 2016;Li et al., 2016;Yáñez Morroni et al., 2018). Moreover, the presence of many small-to-medium size catchments in the area with hydrological response times of a few hours, combined with the high population density of the Mediterranean coastal areas make this region highly vulnerable to flash flooding. Hydrometeorological forecasting chains can extend the period to take protective action beyond the short response times of small catchments, thus providing civil protection agencies with valuable information in order to mitigate the devastating effects of flash floods. Accurate highresolution quantitative precipitation forecasts (QPFs) are required to produce useful flash flood forecasts. The characteristics of the watersheds in the area demand very precise forecasts of intensity and location of rainfall. However, the chaotic nature of convective precipitation critically hampers the accuracy of QPFs at the scale of interest.
Although initial and boundary condition perturbations can generate substantial diversity, and constitute the core of many studies aimed at improving short-range probabilistic forecasts (Peralta et al., 2012;Johnson and Wang, 2016;Keresturi et al., 2019;Hermoso et al., 2020b, among others), it is widely acknowledged that model error must also be considered to comprehensively account for uncertainties at meso-and convective-scales. This important source of uncertainty is related to the representation of specific physical processes, mainly those not explicitly resolved by the model. Different strategies have been proposed to sample uncertainty in model formulation. One of them consists of the combination of multiple models from distinct operational centers. The rationale behind this method is that the combination of the most advanced and accurate models developed at multiple centers can adequately reflect uncertainties present in model formulation. Although positive results have been obtained with this methodology, even at regional scale (Hou et al., 2001;García-Moya et al., 2011;Beck et al., 2016), this strategy requires substantial human resources to simultaneously maintain various models. Considering that the major source of model error is the representation of subgrid processes, a more popular elementary approach is to build a multiphysics ensemble. The technique is based on using multiple combinations of physical parameterization schemes, across the ensemble. This methodology shows good performance in terms of ensemble diversity and skill (e.g., Hacker et al., 2011b;Berner et al., 2011;Wang et al., 2011;Amengual et al., 2017). However, this approach presents some downsides: (i) each ensemble member has a different climatology and error characteristics, and (ii) its inclusion in operational tasks is costly as it requires the development and maintenance of various parameterization schemes, which in turn need to be physically consistent with each other. A variant of this method is the multiparameter approach (e.g., Hacker et al., 2011a;Yussouf and Stensrud, 2012;Duda et al., 2017) in which only specific parameters are modified among different ensemble members. Even though this approach allows a more specific representation of uncertainty and suppresses the requirement of dealing with distinct schemes, it generally displays a poorer performance compared to the classical multiphysics method (e.g., Berner et al., 2015).
An additional standpoint, which is receiving increasing attention recently, consists of inserting stochastic perturbations as a way to represent subgrid model error and introduce spread . The most popular techniques implemented in operational applications are the Stochastic Perturbed Physics Tendency (SPPT; Buizza et al., 1999) and the Stochastic Kinetic Energy Backscatter (Shutts, 2005). In the latter, stochastic perturbations are applied to potential temperature and streamfunction tendencies in order to account for the dissipated kinetic energy that interacts with the flow at the resolved scales, while in the former, the physical parameterizations tendencies in the state variables equations are perturbed. This is achieved by multiplying the tendency by a random perturbation which is spatially and temporally correlated. The characteristics of the random pattern (spatial and temporal decorrelation and variance) can vary depending on the application. For instance, in the original development of Buizza et al. (1999), random numbers were constant within boxes of certain side and over determined time steps and with diverse patterns for each state variable. The version currently operationally implemented at ECMWF uses a prescribed space and time correlation, includes a tapering to 0 for the lowest and top model levels and applies the same pattern to all state variables (Palmer et al., 2009). A similar version has been used in research studies with the Weather Research and Forecasting (WRF) model, although with no tapering (Berner et al., 2015). Even though the SPPT scheme has shown good performance (e.g., Berner et al., 2011), it presents some physically questionable aspects. The perturbation is applied to the full physics tendency, which generates several troublesome results: (i) the effect of the perturbation is zero when the net physical tendency is zero, although contributions -and thus, errorsfrom specific parameterizations may be high, (ii) error characteristics are assumed to be equal among the different parameterizations, and (iii) the perturbation is not applied when and where uncertainty is originated (i.e., within the parameterization formulation), but at a later stage. Furthermore, this stochastic procedure to sample model error can generate numerical instabilities, mainly at the boundary layer (Lock et al., 2019).
Distinct suggestions have been made to solve some of the aforementioned issues of SPPT. The random pattern can be applied to specific parameters of each parameterization scheme, thus targeting the perturbation at the process level. This technique is called Stochastically Perturbed Parameters (SPP; Jankov et al., 2017). Although some positive results are reported (improvement in surface variables forecast at ECMWF, Ollinaho et al., 2017), SPP generally produces an insufficient spread (Christensen et al., 2015;Jankov et al., 2017), so that SPPT usually outperforms SPP. However, recent progresses on SPP have led to a similar performance of this technique compared to SPPT (Lang et al., 2021). Furthermore, the random parameters method designed at the UK Met Office (Bowler et al., 2008;McCabe et al., 2016) is based on randomly perturbing selected parameters, so that they vary over the forecast lead time, but are kept fixed for the whole domain at a specific lead time. McCabe et al. (2016) describes a positive impact in spread and skill in surface variables when this procedure is applied. An additional proposed modification of SPPT introduces independent perturbations for each parameterization scheme to solve the problematic of cancelling tendencies. Christensen et al. (2017) applied this approach to the Integrated Forecasting System of ECMWF, finding an increase in spread in areas with strong convective activity. Wastl et al. (2019a) investigate the implementation of this idea for a regional EPS and Wastl et al. (2019b) combine this methodology with SPP, obtaining a significant improvement of spread, especially when convective activity is intense.
The optimal strategy to represent relevant uncertainties in convection-permitting ensembles remains unclear. The challenges associated with convective-scale forecasting and the need to alleviate the consequences derived from extreme weather events motivate research focused on all methods available to sample uncertainties. In this context, the aim of this study is contributing to answer this essential question regarding the optimal ensemble generation strategy for the convective-scale by focusing on model error. The potential of various stochastic schemes to generate skilful forecasts for a Mediterranean high-impact episode is tested and compared to the multiphysics technique. This approach is a popular methodology to account for model error, which is applied in numerous operational ensembles worldwide (e.g., Descamps et al., 2015;Du et al., 2015), and is also widely used in the hydrometerological forecasting context (Lagasio et al., 2017;Tian et al., 2019;Roux et al., 2020). In the analysis, special emphasis is put on certain local specificities, which are of the utmost importance due to the topographic and hydrological particularities of the area. The remainder of the paper is structured as follows: Section 2 describes the most important characteristics of the case study selected, in Section 3 the model and ensemble configurations used in the different experiments are detailed, results are presented and discussed in Section 4, and conclusions are provided in Section 5.

Case study
A heavy precipitation episode took place on 12 and 13 September 2019 over eastern Spain. Total accumulated precipitation reached 500 mm at some sites, and also intensities in excess of 30 mm in 10 min were registered at some rain gauges. The intense rainfall and the subsequent widespread flash flooding that developed across the regions of València, Murcia and Almería ( Fig. 1) produced devastating effects including 7 fatalities and economic costs in excess of 425 milion EUR (Consorcio de Compensación de Seguros (CCS), 2019).
The synoptic situation on the days prior to the onset of the episode displays the classical pattern that precedes heavy precipitation over eastern Spain (Romero et al., 1999): a deep trough over the north Atlantic at upper-levels, which moved southward towards the Iberian Peninsula during 10 and 11 September. This trough evolved to a cut-off, which remained roughly stationary during 12 and 13 September over north Africa. At low levels, the combination of a cyclonic system over north Africa and an anticyclone over the north Atlantic resulted in an easterly warm and moist flow which impinged the eastern coast of Spain between 11 and 13 September. As a result, the lower troposphere was convectively destabilized, the continuous moisture supply resulted in both high precipitable water levels and sustained water vapor flux convergence along the south-eastern coast of Mediterranean Spain.
Three differentiated phases of the episode arise from the analysis of observational data. (see Hermoso et al., 2021 for a detailed observational and physical analysis of this case study): • Phase 1 (12 September 00-06 UTC): This phase is defined by the presence of an extended area of high equivalent potential temperature (EPT) and convective instability over the southwestern Mediterranean. A persistent convective band formed over the València region near Cap de la Nau promoted by the local orography of the area. Total rainfall over 200 mm was registered in some locations during this phase. • Phase 2 (12 September 06-18 UTC): The area of strongest instability extended southward and individual convective cells were triggered over the sea. These cells organized in a linear precipitation structure, which produced a record of 180 mm in two hours during the most active period of this phase. • Phase 3 (12 September 18-13 September 12 UTC): It was the longest and most intense phase. The principal feature was the formation of a mesoscale convective system (MCS) that affected Murcia during the last hours of 12 and first hours of 13 September. The local topography anchored persistent precipitation systems, with maximum 10min intensities exceeding 32 mm, at some rain gauges in Murcia. Additionally, western Almería was affected by convective bands with accumulations exceeding 200 mm in 6 h. From 13 September 05 UTC, the MCS moved northward affecting the València region.
The particularities of this case, specifically its long duration, the combination of different convection initiation mechanisms, with marine and orographic triggering, and the characteristics of the catchments embedded in the study area, with extensions in the range between 100 and 1000 km 2 , make it an excellent testbed for convection-permitting ensemble strategies. The consideration of a single case study does not allow statistically significant conclusions to be drawn, but the results may point to promising directions in the prediction of extreme events. The statistical analysis is beyond the scope of this study and is left as future work.

Methodology
The performance of multiple stochastic techniques to predict the impacts of the 12-13 September 2019 exceptional heavy precipitation episode is investigated by using the WRF-ARW version 3.9.1 (Skamarock et al., 2008). A domain centered over the western Mediterranean is defined (Fig. 1). It has a horizontal resolution of 2.5 km and 50 vertical levels up to 50 hPa. The time step is set to 12 s. Given the long duration of the severe activity, the period is split in two independent 30-h simulations to benefit from an analysis update of the global model before the onset of phase 3. Indeed, attempts to cover the entire episode with a single simulation resulted in low skill during phase 3. Note that no data assimilation cycle is applied in these experiments. Therefore, the first simulation covers the period 11 September 18 UTC -13 September 00 UTC, while the second one spans from 12 September 18 UTC to 14 September 00 UTC. Only the last 24 h of each simulation are evaluated, whereas the first six hours are left as a spin-up period.
Initial and boundary conditions are extracted from the dynamical downscaling of the ECMWF-EPS, which has a horizontal spectral resolution of O640 (~ 18 km at midlatitudes) and includes 91 vertical levels up to 0.01 hPa. Initial condition perturbations in ECMWF-EPS are built by combining ensemble data assimilation and singular vectors (Isaksen et al., 2010;Bonavita et al., 2017). These perturbations are added to the deterministic analysis, while model error is sampled by means of SPPT. A multi-scale pattern is used with a combination of three different variances, and temporal and spatial correlations which are specified for each individual pattern. Furthermore, a tapering function is introduced to reduce perturbation amplitude at the boundary layer and near the stratosphere . From the 50 members of the ECMWF-EPS, the 10 initial and boundary conditions that display the greatest variability over the defined numerical domain are selected by using a k-means clustering. All ECMWF-EPS members are classified in 10 clusters based on their geopotential height, temperature and relative humidity at 500, 700 and 850 hPa. The 10 closest members to cluster centroid are considered. From these 10 initial and boundary conditions, multiple ensembles of 50 members are constructed with distinct techniques to account for model uncertainty (i.e., multiphysics and stochastic methods), which are thoroughly described below. Since the objective of this study is investigating the impact of model error sampling, no additional initial condition perturbations are introduced, and thus differences among experiments are attributable to the model error methodology. However, a certain degree of variability in initial and boundary conditions is kept, so that the configuration presents some similarities to operational settings. It is noteworthy that these ensembles are prone to be underdispersive owing to the lack of small-scale initial perturbations. This could magnify the benefit of the most dispersive model perturbation strategy, when it may only be counteracting the lack of small-scale initial perturbations.
The baseline physical configuration includes the RRTMG radiation scheme (Iacono et al., 2008), the RUC land-surface model (Smirnova et al., 2016), the NSSL 2-moment microphysics (Mansell et al., 2010) and the Mellor-Yamada and Nakanishi Niino level 2.5 planetary boundary layer (PBL; Nakanisi and Niino, 2006). Convection is explicitly resolved with this set-up, and thus convective parameterization is switched off. For the multiphysics experiment this suite is one of the combinations applied.

Multiphysics (MPS)
Considering the importance of convective processes involved in this episode, cumulus parameterizations would be a leading contributor to model uncertainty in coarse resolutions (e.g., Jankov et al., 2005). However, convection is mostly explicitly resolved in kilometric-scale simulations. In this context, microphysics and PBL are the parameterized mechanisms that most significantly affect deep moist convection. Microphysics accounts for phase transitions between water species, and thus this scheme can produce considerable effects on storm development. Indeed, recent studies emphasize the role of microphysics processes at convective scales in cases of orographic precipitation (Morales et al., 2019) or supercell development (Qiao et al., 2018). As a matter of fact, microphysics perturbations can yield impacts of the same order as initial condition modifications in some situations . The PBL parameterization alters temperature and moisture in the lower troposphere, which is of the utmost importance for convective development (Coniglio et al., 2013). Moreover, turbulent processes in the boundary layer exert an important influence on convection triggering, especially under weak synoptic forcing (Kober and Craig, 2016;Keil et al., 2019).
As an illustrative example, microphysics and PBL temperature tendencies at a model grid point are depicted in Fig. 2. Microphysics tendencies are more intense in areas and times with active convective development (mainly during the unfolding of phase 3 at the end of this simulation), and exhibit lower spatial and temporal correlation than PBL tendencies. Moreover, as expected the effect of the PBL tendencies is only significant at the lowest 1000 m.
The 50 MPS ensemble members are generated by using 5 different physics suites combined with the 10 different initial conditions. Considering the high influence of microphysics processes, three schemes are included for this parameterization: NSSL 2-moment, WRF Single-Moment 6-class (WSM6; Hong and Lim, 2006) and Thompson (Thompson et al., 2008). Conversely, only two PBL options are selected: Mellor-Yamada and Nakanishi Niino (MYNN) level 2.5 and Mellor-Yamada-Janjic (MYJ; Janjic, 1994). The choice of these parameterizations is based on the compatibility among schemes and their suitability for kilometer-resolution simulations. Among the options that do not incorporate any of the schemes of the baseline configuration, only the combination of WSM6 and MYJ is adopted, as these options constitute the standard suite of the operational configuration used by the research Meteorology Group at the University of the Balearic Islands (http ://meteo.uib.es/wrf). A summary of the parameterizations considered is provided in Table 1.

Stochastically perturbed physics tendencies (SPPT)
The SPPT methodology consists of stochastically perturbing the physics tendencies originating from subgrid processes. It was originally introduced by Buizza et al. (1999), and has been used in many operational and research applications. The idea is that mechanisms occurring on unresolved scales should be sampled from a distribution rather than represented by the equilibrium mean (Berner et al., 2015). The version used in this study is the one implemented in WRF v3.9.1. Physics tendencies of potential temperature perturbation (θ), zonal and meridional wind components (u, v), and specific humidity (q) at each grid point are multiplied by a random number: where X phys stands for the tendency of variable X ∈ T, u, v, q and r(x, y, t) is the random pattern, which is a space and time dependent function. The perturbation is constant with height. Prescribed spatial and temporal decorrelation parameters are used to ensure the smoothness of r(x, y, t). The random field r(x, y, t) is defined as a 2D inverse Fourier transform: where the total (K + 1)(L + 1) wavenumber components in zonal (x) and meridional (y) directions in physical space are denoted by k and l. Spectral coefficients r k,l are computed through a first-order autoregressive process: which is controlled by the decorrelation time τ. ε k,l (t) is a Gaussian white noise process with 0 mean and variance equal to 1. Amplitudes g k,l correspond to a two-dimensional Gaussian in spectral space: where κ is the decorrelation length, σ k,l 2 the spectral variances and ρ k, The definition of g k,l is made such that the variance at any grid point σ 2 , which is an additional user-specified parameter, corresponds to the total variance in spectral space. Hence, the characteristics of the random pattern are established by the parameters τ, κ and σ. Being aware of the temporal and spatial scales involved in the convective systems developed during the episode, τ is set to 1 h and κ to 100 km (Table 3, Fig. 3). These values are similar to those used in other convective-scale studies (e.g., Romine et al., 2014). The variance is fixed to 0.25, which is the maximum value that does not generate numerical instabilities, and the amplitude of r(x, y, t) is restricted to the interval [− 2σ,2σ]. In some implementations of SPPT a tapering of the stochastic perturbations near surface and model top levels is introduced (Palmer et al., 2009;Bouttier et al., 2012) in order to prevent numerical instabilities. The WRF SPPT scheme does not include this tapering, but no numerical instabilities are found. As a result, a greater impact on surface variables can be achieved (e.g., Romine et al., 2014). An important drawback of this version is the exclusion of microphysics tendency perturbations. The effect of this parameterization in WRF-ARW is added after the state is updated with the other physical tendencies in order to preserve physical balances. Because of this different treatment of the microphysics processes, its inclusion in SPPT could result in a double counting of these perturbations (Jankov et al., 2019).

Microphysics random parameters (MPRP and SPPT_MPRP)
The relevance of microphysics process for moist convection dynamics (e.g., latent heat exchanges during phase changes of microphysics species), together with the lack of precise information at the subgrid scales, motivates the development of a stochastic scheme that copes with the uncertainties associated with these processes.
The proposed method to perturb microphysics follows the update of McCabe et al. (2016) of the random parameter scheme originally developed by Bowler et al. (2008). A set of parameters are perturbed so as to vary along forecast lead time, but are fixed for the whole domain. The time evolution of the random pattern p is governed by a first-order regressive process, similarly to SPPT, although in this case, only the temporal variation is considered: The values of the parameters are limited within a predefined range, based on physical properties. In order to avoid the introduction of biases in the case that the default value does not correspond to the center of the parameter range, the updated value of the parameters is adjusted such that p = 0 matches the default value, while p = − 1 coincides with the minimum value, and p = 1, with the maximum. A boundary condition is applied at the limits (i.e., p = ±1) to avoid peaks in the distribution of p at the bounds. The modified pattern p* is obtained by reflecting p into the [− 1,1] range: otherwise.
The random process ε(t) is formulated in terms of the autocorrelation et al., 2016): where χ(t) is a random number obtained from a uniform distribution in the interval [− 1,1]. Given the mean duration of the influence of microphysics in convective processes (Fig. 2), the decorrelation parameter is fixed at 30 min (Table 3, Fig. 3). Some parameters of the NSSL 2-mom scheme are perturbed in order to account for uncertainties in important microphysics processes for convective-scale forecasting. The parameters selected are described below and a summary is presented in Table 4: • Cloud condensation nuclei concentration (CCN): The perturbation of this quantity is motivated by the specificities of the Mediterranean region. Its geographical location facilitates the intrusion of air masses of different origin and aerosol content. Thus, incursions of air with high particle content originated in the Saharan desert, continental air masses from central Europe or clean maritime air stemming form the Atlantic are common (e.g., Prospero, 1996;Clarke et al., 1997). This variability in CCN can have a substantial influence on precipitation processes (e.g., Hudson and Mishra, 2007;Barthlott and Hoose, 2018;Keil et al., 2019), although the impacts are variable and largely depend on the particular environmental conditions (White et al., 2017). • Graupel and hail fall speed coefficient (graupel/hail fallfac): This factor modifies the fall speed of ice particles. This alteration can have an influence on mixing ratios and melting at low levels, affecting downdrafts and cold pool intensity (Falk et al., 2019). Furthermore, including perturbation to hydrometeors fall speed led to improvements in convective scale precipitation forecasts . • Saturation percentage for initial cloud formation (ssmxinit): This parameter controls the number of activated CCN that can generate cloud droplets (Phillips et al., 2007). The number of activated CCN is an important property for convective clouds, as it controls the rate of droplet growth with cloud depth and the conversion into precipitation-sized particles (Freud et al., 2011).
Based on this design, two different ensembles are produced: (i) MPRP, in which only the microphysics scheme is stochastically perturbed following the above methodology, and (ii) SPPT_MPRP, wherein these perturbations to microphysics are combined with the sampling of uncertainties from other parameterizations by means of SPPT. A summary of all configurations tested is provided in Table 2. The former experiment is aimed at examining the isolated influence of microphysics perturbations to effectively introduce diversity, while the latter is     A. Hermoso et al. designed to ascertain the mixed impact of perturbations from multiple error sources. It is noteworthy in the highly nonlinear regime of the convective-scale, that the sum of the individual effects of each perturbation may not produce an additive result.

Verification scores
Different probabilistic verification scores are used to assess the ability of the stochastic ensembles proposed to adequately sample the probability density function of scenarios consistent with the uncertainties. In this sense, we focus on socially relevant aspects of the episode, and thus the verification is directed to rainfall. In particular, 3-h accumulated precipitation is considered and three distinct thresholds are set (20, 35 and 50 mm). These values are selected as a compromise between using extreme accumulations and getting statistically significant results. Radar estimated precipitation fields are used for the verification. This information is derived from the reflectivity volumescanning of the three radars located in the study area (València, Murcia and Almería), and calibrated by means of automatic rain gauge data. Further details on this procedure are explained on Hermoso et al. (2021). Radar data is used instead of station data due to the high observational density of the radar dataset over the study area. Observation errors are not considered in the verification.
The model grid length used is 2.5 km, but the effective resolution (i. e., the resolution in which structures are predictable) is lower. The skill of convection-permitting precipitation forecasts at grid scale is generally low (e.g., Ebert, 2008). The neighborhood maximum ensemble probability (Schwartz and Sobash, 2017) is applied to mitigate this problem. In this approach, a grid point has a value of 1 (i.e., occurrence of the event) if the binary event occurs at any grid point contained within a predefined distance. This method is more appropriate for severe weather than averaging a binary value across the neighborhood given the low frequency of extreme events. A square of 20 km side centered on each grid point is used. This spatial scale is of the same order of the extension of the watersheds contained in the study area, and thus is suitable to assess the skill of the different methods considered.
The Brier score compares the forecast probability of a binary event extracted from the ensemble to the occurrence of the event for each available observation (Jolliffe and Stephenson, 2003). The score is negatively oriented, with values ranging from 0 (perfect forecast) to 1. However, note that with the application of the neighboring approach, a perfect score can be obtained with spatial shifts up to 20 km in the forecasted precipitation field. An advantage of this score is that it can be decomposed in three independent components that represent different properties of the ensemble: reliability, resolution and uncertainty.
In order to present a better comparison between the different methodologies, the Brier score is transformed to a skill score, taking the MPS ensemble as reference: Null BSS indicates an identical skill, while positive (negative) values imply an improvement (degradation) with respect to the reference.
Additionally, Receiver Operative Characteristics (ROC; Mason, 1982;Mason and Graham, 2002) are computed, as this curve is widely used to evaluate the skill of probabilistic forecasts. It compares the hit and false alarm rates for distinct probability thresholds, and the area under the ROC measures the ability of the ensemble forecast to discriminate between events and non-events. A value of 1 is obtained for a perfect probabilistic forecast, while 0.5 corresponds to no skill above the random forecast.
The statistical significance of the verification scores computed across the study is assessed by means of a bootstrapping technique. 95% confidence intervals are generated from 1000 resamples with replacements. However, given the limited volume of data considered, this significance test can only be used to identify relatively important differences among experiments for the weather situation analyzed in this study.

Ensemble characteristics
Before analyzing the performance of the different ensemble generation strategies proposed, their main features in terms of extremes and diversity in accumulated precipitation are inspected. 95th percentiles and interquartile ranges are used to quantify extremes and spread, respectively. Over the course of phase 1 (Fig. 4a-d), ensembles containing perturbations to microphysics yield the largest spread and higher rainfall amounts than MPS and SPPT, with 6-h precipitation in excess of 50 mm in the 95th percentile. However, a northward displacement error with respect to the observed position of the thin convective band is identified, and reaches near 50 km for the SPPT_MPRP.
During the unfolding of phase 2 ( Fig. 4e-h), all ensembles render similar diversity and precipitation quantities over land, and only minor differences among experiments are detected over sea, where ensembles based on microphysics perturbations display the largest spread. At the end of phase 2, ensemble spread highlights the area of convective development over the sea, and MPS features more widespread variability than stochastic-based ensembles. At this stage, the differences between SPPT and the other stochastic ensembles are lower, with similar spread and maximum values located off the coast of Almería.
This behavior continues at the beginning of phase 3, while significant spread extends along the littoral of Murcia and Almería ( Fig. 4i-l), where intense convective activity occurred. Diversity is larger for stochastic techniques, especially those acting on microphysics perturbations. The extension of the area with notable ensemble spread adjacent to the coastline reflects changes in precipitation location sampled by the ensembles, and the maximum variability is found transversal to the coastline in accordance to the observed evolution of storms. Indeed, the spread provides an uncertainty quantification of rainfall location and intensity.
During the first hours of 13 September, when the MCS was in its mature stage, clear differences between MPS and ensembles including stochastic perturbations are detected (Fig. 5). Both ensemble diversity and maximum 6-h precipitation accumulations are smaller for MPS. In addition, areas of larger spread lay offshore València and Almería, while stochastic methods provide larger diversity over land at southeast Murcia and Almería. This latter structure is consistent with the areas where deep moist convection developed, namely near Murcia owing to the presence of the MCS and Almería, which was affected by intense convective bands. Comparable results are reproduced during the northward evolution of the MCS at the end of the episode. The largest spread is found over the sea north and south of Cap de la Nau, and is higher for the stochastic ensembles.
The larger diversity generated by the stochastic schemes is consistent with the results of other studies (e.g., Romine et al., 2014;McCabe et al., 2016;Jankov et al., 2017;Gasperoni et al., 2020). Indeed, the areas with higher diversity reveal intensity and location variability among ensemble members and are consistent with the main features of the heavy precipitation episode. This result is particularly relevant for probabilistic hydrometeorological forecasts. An accurate representation of the uncertainty associated with rainfall intensity and location is crucial, due to the highly nonlinear effects in the hydrological response related to small variations in these quantities. Moreover, the stochastic procedure applied to the microphysics processes may be suitable to improve model error sampling for these kind of episodes characterized by prominent maritime activity and deep moist convection. The relevance of microphysics in heavy precipitation events involving deep moist convection is well established (e.g., Keil et al., 2019;Lupo et al., 2020), owing to multiple microphysical processes related to condensation and the presence of multiple frozen hydrometeors.

Environment characteristics
The efficiency of the proposed techniques to effectively introduce modifications to the thermodynamical environment is investigated. For the sake of simplicity, and as an illustrative example, an individual member of each stochastic ensemble is compared to the corresponding MPS member (unperturbed). Note that the differences between these simulations are attributable to the stochastic perturbations, because initial and boundary conditions, as well as physical parameterizations remain unchanged. Convective available potential energy (CAPE) is analyzed as a measure of instability. The value considered is the most unstable CAPE.
Stochastic modifications to microphysics parameters initially generate small-scale perturbations to CAPE in areas with convective instability, in which microphysics is active. However, these perturbations only grow in intensity and acquire higher spatial correlation in areas where deep moist convection is developing, mainly over the southwestern Mediterranean (Figs. 6 and 7). Conversely, SPPT introduces larger scale perturbations (more spatially correlated), regulated by the decorrelation length parameter, and these also intensify around areas of strong convective activity. Nevertheless, the spatial scale of perturbations outside the areas of most intense convective growth is larger than for the microphysics perturbations. Besides design differences, the variations between the two methods can be attributed to PBL processes, which have larger spatial correlation scales than microphysics. The combination of stochastic schemes introduces perturbations of different spatial scales, which eventually evolve similarly to MPRP and SPPT perturbations in the neighborhood of active convective development (Figs. 6 and 7). This result is consistent with studies that investigate upscale perturbation growth from convectivescales, which is much enhanced in areas with deep moist convection (Selz and Craig, 2015;Zhang, 2019).
The spatial characteristics of the perturbation can be quantified using two parameters -log ρ and ω 2 -representing the amplitude and the variance of the perturbation, respectively. These quantities were introduced by Primo et al. (2008), and mathematically correspond to the geometrical mean of the perturbation (log ρ) and the variance (ω 2 ): where N is the number of grid points with non-zero perturbation |δx i |. For stochastic methods, δx i is computed as the difference between the stochastically perturbed ensemble member and the corresponding member of MPS with the same initial conditions and physical parameterizations. For MPS, δx i is computed as the anomaly with respect to the ensemble mean, as there is no unperturbed reference for this experiment. δx i is computed considering all model levels below 5000 m and the four state variables (T, u, v and q). Since there are more model levels near the surface, boundary layer perturbations are more represented in δx i . Potential temperature and specific humidity perturbation are rescaled to ensure dimensionality consistence: where C p is the air specific heat capacity, L the latent heat of condensation, T ref = 300 K, and factor w q = 0.1 is introduced to level the magnitude of humidity perturbations with the rest.
Although ω 2 is defined as the variance of perturbation amplitude, it can be interpreted as a measure of localization. Admittedly, ω 2 does not provide a universal perturbation scale characterization, but the value of this quantity (i.e., perturbation amplitude dispersion) for a particular dynamical system -defined by the model and the domain numerical configuration -is related to its scale, and to the spatial correlations generated by the dynamics of the system (Primo et al., 2008). Low (high) ω 2 imply large (small) scale perturbations with respect to domain size.
The evolution of perturbations shows a rapid amplitude growth for MPRP perturbations, which are highly localized during the first hours of simulation (i.e., large ω 2 ). The two ensembles including SPPT display virtually identical development, starting from larger scale perturbations, conditioned to the spatial decorrelation scale parameter, and growing at a smaller rate than MPRP (Fig. 8). The saturation level is higher for SPPT, as perturbation amplitudes are intense over larger areas compared to MPRP, which translates into higher log ρ and lower ω 2 . This behavior is consistent with the conventional scale-dependent evolution of perturbations (e.g., Toth and Kalnay, 1993). Nonetheless, saturation variations between perturbations with different initial spatial correlations are slight and of minor importance over areas with deep moist convection. Therefore, diversity in log ρ originates in areas distant from convective activity. When SPPT and MPRP perturbations are combined, the former dominates the scale and amplitude of the full perturbation, so that negligible differences are obtained for log ρ and ω 2 . MPS perturbations display the largest amplitude and spatial correlation, although the procedure to compute these perturbations is fundamentally different from the other experiments.
The modification of convective instability caused by stochastic perturbations has an influence on precipitation. As an illustrative example, changes produced in the above analyzed members with respect to the unperturbed MPS simulation are inspected and qualitatively compared to the 6-h accumulated rainfall errors (Figs. 9a and 10a). These fields are obtained by using radar estimated precipitation. Minor variations are identified during phases 1 and 2. However, they are substantial at the onset of phase 3, during the last hours of 12 September. The unperturbed member produces precipitation in areas of southern Murcia and Almería, with notable location errors with respect to observations. Stochastic perturbations introduce modifications in the forecasted precipitation field, with scales compatible with these localization errors (Fig. 9b-d).
In this example, the methodologies investigated drive the most intense A. Hermoso et al. rainfall areas southward and eastward, closer to observations. During the first hours of 13 September, SPPT and MPRP members produce a more accurate forecast around the València coastline and over southeast Murcia (Fig. 10b,c). In this case, the combination of different stochastic perturbations causes a degradation of the precipitation forecast in some areas of València, where excessive rainfall is already predicted by the unperturbed member, and an additional increment is introduced (Fig. 10d).
Therefore, stochastic perturbations tend to grow in areas with deep moist convection and introduce modifications to the thermodynamic environment. These changes produce alterations in precipitation of the same scale as the model errors, and thus this methodology has a great potential to sample precipitation uncertainties, including extremes. In addition to the useful information distilled from this analysis, a precise probabilistic verification must be conducted to comprehensively assess the performance of each ensemble strategy investigated.

Probabilistic scores
Given the promising diversity of outcomes generated by the stochastic strategies and the effective perturbation growth characteristics, a quantitive assessment is carried out by means of the probabilistic verification scores introduced in Section 3.2. Since they are only computed for an individual heavy precipitation episode, which implies a certain level of correlation between verification locations, the analysis of these scores is not generalizable to other cases in a strict sense.
During phases 1 and 2, multiphysics exhibits the lowest Brier score for all considered thresholds, and thus negative BSS are obtained for the stochastic ensembles tested (Fig. 11a-c). The area affected by the linear precipitation structure is better captured by the multiphysics-based ensemble (Fig. 12). However, differences are only important at the 20 mm threshold. The first phase is strongly influenced by an elongated area of high EPT over the western Mediterranean. Uncertainties related to initial conditions play a primary role at this stage, and variability introduced by stochastic techniques is modest. Concerning Brier score components -reliability and resolution -modest variations among experiments are found, especially for the largest threshold.
These low contrasts between experiments are propagated to the second phase. This period is primarily governed by the southward evolution of the high EPT intrusion. At the beginning of this phase, low forecast probability is found over large areas of the occurrence of the binary event over the southern València region (Fig. 12a,b), resulting in low reliability. During the subsequent evolution, maximum values of probability are moderate (around 0.5-0.6). Thus, ensemble ability to distinguish between the occurrence and non-occurrence of the event is low, leading to low forecast resolution (Fig. 12c,d).
Additionally, ROC areas show better discrimination skill for stochastically perturbed ensembles for the 50 mm threshold during phases 1 and 2, especially for ensembles with microphysics perturbations, owing to higher hit rates and probability maxima. This fact is related to the larger rainfall extremes produced by these ensembles. Despite this increase in high values, areal accumulated precipitation is slightly lower than for the unperturbed simulations. By contrast, the skill in terms of discrimination is better for MPS at the low thresholds, due to a lower proportion of false alarms (Fig. 13a-c). Stochastic ensembles slightly increase the probability of the event in areas of non-occurrence and reduce the probability in areas of occurrence. Singularly, the false alarm A. Hermoso et al. increment effect for MPRP is lower than for the other stochastic ensembles during phase 1. Regarding the first part of phase 2, hit rate and forecasted probabilities are low, while during the subsequent hours both hit and false alarm rates increase, with fewer false alarms for MPS. The stochastic modifications introduced are not sufficient to capture the formation of the convective line that distinguish this stage.
By contrast, during phase 3 differences between experiments are higher (Fig. 11). This period is characterized by the formation of a quasistationary MCS over an area with complex topography, which produced persistent and intense precipitation over Murcia. At this stage, model error is more influential than in the previous phases and significant differences between multiphysics and stochastic techniques are attained. During the last part of the first simulation, stochastic methods including microphysics perturbations yield the best performance, which is consistent with the intense effect of microphysics tendencies during this phase (Fig. 2). The area of highest probability of occurrence is more concentrated for ensembles with stochastic parameterizations (Fig. 14). Reliability is improved by a better correspondence between forecasted probabilities and observations. Moreover, larger probabilities are forecasted, especially by stochastic ensembles (Fig. 14b), which leads to an increased resolution.
This improvement of stochastic approaches with respect to multiphysics extends to the first part of the second simulation, although in this case no relevant differences are obtained between the various stochastic strategies. At this time, variations are mostly attributable to resolution modifications. Indeed, forecasted probabilities > 0.9 are produced by MPRP over an elongated area of approximately 100 km affected by the MCS (Fig. 14d). Conversely, during the northward evolution of the MCS, skill is slightly better for multiphysics, mainly due to a better reliability. The introduction of stochastic perturbations produces a clear enhancement of ensemble performance in terms of discrimination between the occurrence and non-occurrence of the selected binary events for the period characterized by the quasi-stationary MCS (Fig. 13c-e), as MPS produces more false alarms compared to stochastic methods at this stage. In the course of the northward evolution of this system, differences among ensembles reduce.
The combination of multiple stochastic perturbations does not translate into a clear improvement of probabilistic forecast skill for this Fig. 9. a) Difference between 6-h accumulated precipitation (12 September 18-13 September 00 UTC) for an unperturbed member and radar estimated rainfall, and b)-d) modifications to the unperturbed member produced by the distinct stochastic techniques. Locations of València, Murcia and Almería radars are represented by a purple dot, triangle, and square, respectively. The area outside radar coverage is grayed out in panel a).
episode. Although during the last hours of the episode, ensembles containing a combination of stochastic perturbations render the best efficiency -especially for high precipitation thresholds -this behavior is not uniform along the episode. This contrasts with the results of other studies of ensembles based on stochastic parameterizations (e.g, Jankov et al., 2017Jankov et al., , 2019Xu et al., 2020). The nonlinear interactions which operate at convective scale, especially in heavy precipitation cases can counteract the individual positive effects of each perturbation, resulting in a degraded forecast. Moreover, the fast perturbation growth characteristic at small scales can induce rapid saturation, which is not overcome by including additional perturbations. Furthermore, it must be taken into account that conclusions derived from an evaluation over an extended benchmarking period, as carried out in the above mentioned studies, may not be applicable over a particular case study.

Ensemble features at catchment scale
The quantitative scores provide a general overview of ensemble performance across the verification area. However, the particular characteristics of the watersheds present in the region demand the generation of accurate QPFs at remarkably small scales in order to be able to estimate the potential flooding risk and taking appropriate protection measures. In this subsection, the forecasted maximum hourly precipitation intensities in 6-h intervals are analyzed and compared to rain gauge values placed inside eight small-to-medium sized catchments (Fig. 15). Owing to the limited number of basins, this evaluation is not centered on the ability of each ensemble to reproduce the probability density function compatible with the observed state, but on the ability of the ensembles to reproduce observed rainfall intensities at very local scales. Station data is used for this evaluation as it is generally more accurate than radar estimates and a smaller dataset is adequate for an analysis at specific basins.
During phase 1, the observed maximum intensity is reproduced by all ensembles at all the considered catchments, except Beniarrés, where a remarkable extreme value of almost 70 mm in 1 h was recorded (Fig. 16a). Ensemble diversity is similar in most watersheds, excluding Moixent and Bellús. At the former, microphysics perturbations increase spread compared to SPPT, while at the latter, stochastic techniques render greater diversity. This is in agreement with the general behavior previously identified. During the first part of phase 2, some members of the stochastic ensembles capture the high intensity registered in Beniarrés. Hence, the higher spread of these ensembles has also a  Fig. 9 for 13 September 00-06 UTC.
A. Hermoso et al. relevant impact at local scale. Nevertheless, diversity is large at some catchments (Albujón, Benipila, Salada), which would render some false alarms for this particular episode. In the subsequent evolution, the largest spread is localized over the central and southern basins, although none of the stochastic methods proposed is able to simulate the 15 mm h − 1 recorded in Salada. It should be taken into consideration that this period corresponds to 18-30 h lead time for the first simulation, and for such long lead times, localization errors tend to be high. Thus, the frequency of misses and false alarms increases, especially at the small scales examined in this analysis.
During phase 3 (Fig. 16e, f), observed maximum intensities are captured by some ensemble members in all the experiments. For the Albujón, Benipila and Beniarrés basins, the highest ensemble percentiles display significant extreme values > 60 mm h − 1 , particularly SPPT. Nevertheless, the combination of this perturbation with other stochastic perturbations results in a reduction of these extremes, indicative of counteracting effects produced by each perturbation. Conversely, false alarms are obtained in some catchments where low rainfall rates were registered during the first hours. Stochastic perturbations reduce the excess produced by MPS in Moixent, while similar performance is obtained in Bellús (Fig. 16e). During this period, the major convective activity is located southward associated with the MCS in Murcia and the Fig. 11. BSS for 3-h accumulated precipitation as a function of lead time for the first (a-c) and the second simulations (d-f) for 20, 35 and 50 mm thresholds. Colored squares indicate BSS significantly different from 0 (i.e., significant improvement or deterioration with respect to MPS) at the 95% confidence level for the case study considered.   Fig. 11 for the area under the ROC. Bicolored circles represent significant differences between experiments identified by those colors at the 95% confidence level computed as described in the text.

Fig. 14. As in
A. Hermoso et al. convective bands in Almería, while only small precipitation structures developed near Cap de la Nau, where the Beniarrés, Bellús and Moixent catchments are placed. Slight intensities were registered over Bellús and Moixent, but due to localization errors, several members generate substantial rainfall on these basins. Stochastic ensembles produce lower onshore precipitation than MPS (Fig. 5a-d), which is consistent with the lower rainfall intensities forecasted in Moixent. During the last part of the episode, maximum hourly precipitation intensities over Beniarrés, Moixent and Bellús are captured by some ensemble members. Excluding Beniarrés, diversity is larger for stochastic techniques, considering only the basins where substantial intensities were recorded. In brief, the increased spread displayed by stochastic ensembles at the scale of the study area is also reflected at catchment scale. Overall, the diversity generated by all tested techniques is enough to represent the observed maximum hourly intensities, except for a particularly extreme case of Beniarrés during phase 1 with 67 mm h − 1 .

Conclusions
Stimulated by the typical deficiencies found in many convectivescale probabilistic forecasts concerning underdispersion, an examination of multiple methodologies to cope with model uncertainties for a heavy precipitation episode in eastern Spain is conducted. Given the importance of moist processes in such events and the presence of prominent orography near coastal areas in the study region, it is hypothesized that specific parameterized mechanisms, namely microphysics and PBL, played a significant role in the evolution of the episode. Therefore, accounting for inaccuracies in the formulation of these processes is a requirement for the improvement of probabilistic hydrometeorological forecasting chains, which require precise quantitative precipitation forecasts in order to issue valuable civil protection warnings to mitigate personal and material losses.
In this context, the performance of 50-member ensembles based on multiphysics and different stochastic methods is evaluated for the heavy precipitation episode of 12 and 13 September over eastern Spain. Stochastic experiments include the version of SPPT implemented in WRF (Berner et al., 2015), and an adaptation of the Random Parameters scheme of McCabe et al. (2016) applied to the microphysics parameterization (MPRP).
Microphysics processes are found to hold a significant influence in the development of the episode, as ensemble spread is substantially increased when uncertainties in this parameterization are considered. In this regard, the suitability of the assumptions based on regional specificities and physical considerations, which consider the selected parameters (CCN concentration, fall speed factor and saturation percentage for cloud formation) as influential factors is proven. In general terms, stochastic ensembles improve upon the diversity generated by the multiphysics technique, so resulting in a more comprehensive strategy to sample uncertainties at these scales and improve the forecast of extreme phenomena.
Perturbations generated by the multiple methodologies inspected present diverse characteristics. Microphysics perturbations initially feature low spatial correlation. However, they intensively grow over areas with intense convective instability and gain spatial correlation, while they decay in other areas. Conversely, SPPT perturbations introduce modifications with a spatial correlation determined by the parameter specified in the scheme. As for MPRP, perturbations are intensified in areas with deep moist convection, but also in other regions. Hence, variations produced by MPRP are focused over the most influential areas. The changes introduced produce variations in the precipitation field at the scale of the errors, indicating the potential of stochastic perturbation to improve rainfall forecasts.
Stochastic experiments exhibit better verification scores compared to mutiphysics during the last phase of the episode, characterized by the intense influence of local orography and convective development, mainly initiated over the sea. In this situation better reliability and discrimination -increasing hit rates and reducing false alarms -is obtained for stochastic methods with negligible differences among specific strategies. This finding reveals the benefits of perturbing subgrid processes under circumstances wherein small-scale mechanisms play a key role. By contrast, in the previous phases, which are dominated by larger scale structures differences are lower and multiphysics exhibits better skill, especially for low precipitation thresholds.
The combination of various stochastic perturbations does not present overall significant benefits compared to the use of simple schemes. The nonlinearities that dominate convective-scale dynamics and the fast perturbation saturation may prevent an additive effect when multiple techniques are combined.
The increase in spread caused by stochastic methods is also translated to local scales. All experiments demonstrate the capacity to generate scenarios that capture the maximum hourly precipitation intensity over some small-to-medium size watersheds in the study area. This diagnosis is remarkable for hydrometeorological forecasting, which demands accurate rainfall forecasts at small scales of the order of basin size. Despite this, some extreme observed intensities are not forecasted, which emphasizes the difficulties found to design ensemble strategies that seamlessly sample the entire range of possible scenarios compatible with the uncertainties associated with the initial state and model formulation.
The potential benefits in terms of diversity and skill displayed by the stochastic techniques, and in particular by the microphysics perturbations, present several avenues for further research. Admittedly, the application of these techniques to a single episode limits the significance of the results, and the identified differences in forecast skill may not be representative of ensemble performance in other cases. Nevertheless, the encouraging results obtained in this study lay the basis for a wider investigation including an extended period or a collection of highimpact episodes in order to pursue statistically significant results. In addition, a thorough examination of the particular physical processes modified by stochastic perturbations that lead to an increased ensemble spread could help understanding the behavior of stochastic perturbation methods and making more informed configuration choices that would allow us to extract a greater benefit from stochastic techniques and ensemble prediction systems. Analyzing the impact of perturbations to additional microphysics and PBL parameters could contribute to this goal.
Despite the positive impact of stochastic perturbations to microphysics, the effect cannot be extrapolated in a straightforward manner to other relevant processes or variables for heavy precipitation and flash flooding. The application of stochastic perturbations to sea surface temperature was also attempted, as previous studies have shown its influence on precipitation intensity and location (e.g., Stocchi and Davolio, 2017;Ivatek-Šahdan et al., 2018;Iizuka and Nakamura, 2019), and also its impacts on hydrometeorlogical forecasts over small-tomedium size catchments (Senatore et al., 2020). However, only marginal effects were obtained either from exclusively perturbing this field or in combination with the other methods investigated in this study.
Furthermore, initial and boundary conditions should also be included in order to sample all relevant sources of uncertainty. The scarcity of in-situ observations over the sea is extremely influential in the Mediterranean basin. Indeed, many convective systems that ultimately affect coastal areas originate over maritime areas. Therefore, the characterization of uncertainty over the sea and the adoption of techniques that allow the translation of information from land towards Fig. 16. Maximum hourly precipitation intensities for different 6-h periods over the location of rain gauges depicted in Fig. 15 provided by the four ensemble configurations defined. Boxes represent interquartile ranges and horizontal lines inside boxes indicate ensemble median. Black horizontal lines stand for observed values of maximum hourly rainfall intensity. maritime areas is crucial. These error sources are not considered in this study so as to better isolate the impacts of model error sampling strategies. However, the combined effect of initial and boundary condition and model perturbations in a highly nonlinear framework may provide interesting conclusions for the development and improvement of convection-permitting ensembles.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.