1 Introduction

Fulfilling water demand in situations of water scarcity is one of the major challenges faced by Water Utilities (WUs). Prolonged droughts due to alterations in climate are among the causes of water scarcity, particularly in the Mediterranean area, which is one of the regions most vulnerable to climate alterations (IPCC 2013; Forestieri et al. 2018). Climate alterations have direct impacts on the surface water balance and groundwater recharge (Arnone et al. 2018), and thus changes in the reservoir inputs. In such conditions, the management of water resources becomes a challenging task for WUs (Wilhite et al. 2007). However, WUs management routines scarcely consider climate information, and, when they do, they are based on the stationary assumption. Therefore, their response to water shortage relies on short-term strategies aimed at reducing water consumption and improving watershed management, rather than focusing on comprehensive planning for the long-term consequences of climate change.

When planning their operations for the next relevant period (few months to a year), as well as their long-term infrastructures investments (few years to decades), the standard approach is to rely on historical weather data (e.g., Danilenko et al. 2010) to derive probability distributions, and to estimate the likelihood of future events (e.g., occurrence of a certain amount of precipitation in a certain period of the year). Specifically, the approach used to assess anomalies in precipitation and drought is based on the derivation of standardized indices, such as the Standardized Precipitation Index (SPI, McKee et al. 1993; Cancelliere et al. 2007; Bonaccorso et al. 2015), and its modifications, e.g. Standardized Precipitation Evapotranspiration Index (SPEI, Vicente-Serrano et al. 2010). Information from large scale atmospheric circulation patterns, e.g. the North Atlantic Oscillation (NAO), as predictor of future SPI index have been successfully adopted to develop forecasting models of droughts transition probabilities at short and middle terms (Cutore et al. 2009; Bonaccorso et al. 2015). The assumption underlying the above described approaches is that the climate system remains essentially stationary over the timescales of interest. In a situation of increasing climate volatility, this assumption is no longer viable.

Inaction on water supply systems by the WUs, especially in the Mediterranean area, may imply higher costs of coping with emergencies or higher costs for the provision of a given standard of service with the existing infrastructure. An interesting example, in this sense, is the case of Rome Municipality, in Italy, where the WU ACEA experienced a serious water shortage in summer 2017 during the extreme heat wave that hit Southern Europe (WWA 2017). Years 2001 and 2016 also recorded precipitation anomalies in Southern-Italy, especially in Sicily (SIAS 2002, 2016; ISPRA 2001, 2016).

Getting a reliable early assessment of near-future rainfall anomalies would provide a valid support for the evaluation of water availability and the management of the related water supply system.

Seasonal forecasts (SFs) may offer a powerful tool for guiding a strategic planning of the resources across several climate-sensitive sectors (e.g. De Felice et al. 2015; Viel et al. 2016). Over Europe, practical applications of seasonal forecasts were rarer up to the last decade, mainly due to their uncertain skills in this region (Doblas-Reyes 2012). To fill the gap, the projects EUPORIAS promoted the use of climate information for decision support by involving both providers and potential users of seasonal data (Buontempo et al. 2017). It was demonstrated that seasonal forecasts may give important contributions in the fields of drought-risk assessment and mid-term reservoir management (Viel et al. 2016; Crochemore et al. 2017). A synoptic overview of the current applications of climate information across different sectors in Europe is given by Soares et al. (2018). Today, seasonal forecasts and climate projections are rather commonly used in agriculture, energy, water and ‘other’ sectors (including the environment, weather and climate change, industry and research sectors). For instance, Clark et al. (2017) successfully explored the skills of seasonal forecasts for the wind energy industry over Europe. Specifically, they demonstrated the good predictability of the selected variables, i.e. wind speed and temperature, through the NAO index. The potential of seasonal forecasting in the prediction of reservoir-hydrological variables (e.g. reservoir monthly volume) has been demonstrated by Marcos et al. (2017) on a river basin in Spain based on multilinear regression models. Arnal et al. (2018) conducted an in depth investigation over Europe on the advantages of using seasonal forecasts for predicting the streamflow at seasonal scale as compared with the use of classical historical meteorological observations. They analyzed the skill of the operational EFAS (European Flood Awareness System) seasonal streamflow forecasts against the ESP (Ensemble Streamflow Prediction) forecasting approach, founding that the predictability varies in space and time.

In this study, we propose a methodology based on seasonal forecast data to assess the probability of drought occurrence in the future, over a mid-term horizon which spans from 1 to 7 months. The cumulated precipitation over different time windows is analyzed to assess the drought occurrence probability. Based on this information, changes in the expected future water availability can be evaluated by the local water resources managers. To design and then validate the methodology, we conducted two case studies, one in Zakynthos (Greece) and one in Sicily (Italy). The selected areas belong to two countries which most suffer for changes in climate and water scarcity.

The structure of the manuscript is the following: section 2 describes the methodology, i.e. the types of the adopted data (2.1 and 2.2) and the developed procedure (2.3 and 2.4); section 3 introduces the two case studies; in section 4 the results are discussed; finally, section 5 concludes the manuscript.

2 Methodology

2.1 Seasonal Forecasts

The methodology builds on two datasets: SF and reanalysis data.

SFs are climate predictions for the next few months, starting from any initial date. They are produced with numerical models of the climate system, that are very similar to those adopted for forecasting the weather of the next few days (Hoskins 2012). Differently from wheater forecasts, SFs directly predict both the slow component of the climate system (i.e. the ocean) and the fast component (i.e. the atmosphere).

In seasonal climate predictions, as in any chaotic system, tiny variations of the initial state may lead to diverging trajectories in a relatively small amount of time. A number of predictions, known as ensemble, is performed with the numerical climate model, and each result is defined as a member of the ensemble. The initial states of the ensemble members differ only slightly, and the spread is consistent with the observation uncertainties. However, the predictions might differ substantially further ahead in the future. In this way, it is possible to sample the uncertainty in the forecasts caused by the initialization intrinsic uncertainty.

A characteristic of the SF is the lead time, LT, which indicates the ‘time distance’ between the issuance of the forecast and the occurrence of the phenomena that are predicted. Value 0 means that the forecast target period begins the same month of the release; value 3 means that target period begins 3 months after the release.

The entire data normally includes two types of dataset: forecasts in real-time, which are those updated to the present, and retrospective forecasts (hindcasts) initialized at equivalent intervals and necessary to validate and test the skill of the forecast using historical data. The two datasets can be characterized by a different number of members.

SFs are released by various climate centers. This study uses the SF belonging to the System 5 (SEAS5) archive, released by the European Centre for Medium-Range Weather Forecasts (ECMWF) and made available by the data access system of Copernicus Climate Data Store (CDS).

We use the total precipitation at surface daily data available at the following link: https://cds.climate.copernicus.eu/cdsapp#!/dataset/seasonal-original-single-levels. The dataset has a global coverage and a spatial resolution of 1°×1°, and it includes forecasts in real-time (since 2017) and hindcasts initialized in the period 1986–2016. Real-time forecasts consist of a 51-member ensemble, generated using a combination of SST and atmospheric initial condition perturbations and the activation of stochastic physics. The runs are 7-months long and are released on the 5th day of each month at 12:00 UTC. Therefore, LT goes from 0 to 6. The hindcast datasets have the same characteristics but consist of 25-member ensembles.

2.2 Reanalyses

Reanalysis data are becoming of common use in hydrological and, in general, impact modeling. They derive from the combination of forecasts models and data assimilation systems, used to reanalyze archived observations. The main advantage of reanalysis data is the continuity in space and time, since they are produced at global scale and do not have gap or missing records. Similar to the SFs, they are produced within a regular grid format by the same climatic centers.

In this work, reanalysis data serve as a reference to derive the site-specific climatic characteristics, to assess the skill of the SFs and to estimate the anomalies as compared to the expected statistics. Reanalysis data are assumed as a surrogate of observations. The development of ad hoc statistical metrics for relative comparison with historical observation allowed to overcome the procedure of data correction (e.g. bias correction).

The daily total precipitation data have been retrieved from the ERA-Interim dataset. Data are available from 1979 to present, and the spatial resolution is 0.75°x.75°, i.e. approximately 80 km.

2.3 Drought Assessment

The likelihood of a drought occurrence in the upcoming season is assessed at monthly scale over a defined time window. The cumulated precipitation is the main input variable, which is computed and verified by comparison with the climatology over the same time window, in order to define a ‘climate state’.

Let us denote the time window as cumulation period, CPm, where m indicates the number of months, which may vary from 1 to 6, i.e. over the all possible LTs. Once a CPm is selected, precipitation is cumulated over a rolling window of length CPm: as an example, CP3 leads to a three-month rolling window, targeting the periods January–February-March, JFM, February–March-April, FMA, and so on.

The statistical properties characterizing the cumulated precipitation in each period are derived from the SF hindcast data over a predefined number of years, Ntot (Fig. 1). For each CPm, a time series of length Ntot is created and the terciles of its probability distribution are derived. Two classes of precipitation are derived: ‘dry’, for cumulated precipitation below the 1st tercile; ‘not-dry’ for cumulated precipitation above the 1st tercile.

Fig. 1
figure 1

Steps to compute the discriminant threshold frequency and the maximum false rate for given CPm-LT combination. Hindcast data of SF and reanalysis data over a test period are used

The methodology then requires that the real-time seasonal predictions (in the next future) for the selected CPm are verified against the climatology. The ‘drought’ alert is released when the frequency of ensemble members that fall within the dry class is sufficiently high, according to a probabilistic approach. This then involves a classification procedure.

Theoretically, the frequency threshold which discriminates dry and not-dry classes is equal to 1/3, corresponding to the tercile frequency. Practically, the optimal value of this threshold is derived from a calibration procedure of hindcast data against observations. This allows the computation of (i) the skill of the classification over past predictions, (ii) the optimal discriminate threshold frequency, and (iii) a curve to assess the reliability of future predictions (see next section). The entire procedure is depicted in Fig. 1. It is based on the derivation of the sensitivity/specificity variables and Receiving Operating Characteristic (ROC) curves (Fawcett 2006), widely used in literature and SF approaches to quantify the goodness of a prediction model (e.g. Vogel et al. 2018; Hyvärinen et al. 2015). In particular, the sensitivity evaluates the true positive rate, i.e. the matching between observations and prediction in correctly classifying an occurrence within the dry class. The specificity evaluates the true negative rate, i.e. the matching between observation and prediction in correctly discarding an occurrence from dry class:

$$ sensitivity=\frac{TP}{TP+ FN} $$
(1)
$$ specificity=\frac{TN}{TN+ FP} $$
(2)

where TP indicates true positive, TN the true negative, FN and FP are the false negative and false positive, respectively. They quantify the ability of the system to correctly identify the reference observations as belonging to the not-dry and dry classes, respectively. Sensitivity and specificity are calculated from the classification tables by varying the cutoff threshold, which is the threshold frequency of the ensemble members that determines whether the predicted cumulated precipitation falls within the dry or not-dry class.

The intersection between the sensitivity and specificity curves identifies the threshold frequency that optimizes the capability of the algorithm to correctly attribute forecasts to the dry and not-dry classes, i.e. which minimizes the rate of the two types of erroneous classifications. The definition agrees with one of the methods available in literature that identifies the optimal cut-off threshold as the value corresponding to the intersection of the ROC curve with the −45 degree line (Sanchez 2017). Other methods are based on the Youden index (Youden 1950) and the minimum distance from the (1,0) point to the ROC curve (Kumar and Indrayan 2011), which lead, in most of the cases, to similar result. We denote the optimal cut-off threshold as Discriminant Threshold Frequency (DTF).

The procedure applies for all 21 possible combinations of CPm-LT, (CP1:6-LT0, CP1:5-LT1, CP1:4-LT2, CP1:3-LT3, CP1:2-LT4, CP1-LT5), which are characterized by their own sensitivity/specificity curves and DTF. The resulted value is site and CPm-LT specific.

Ultimately, in operative phase and with real-time SFs, the climate state (drought or not) for an upcoming CPm, at lead time LT, results from the comparison between the frequency of ensemble members of the forecast dataset which fall within the dry class of the corresponding CPm and LT (fqCPm,LT), and the DTF associated to the given CPm-LT, DTFCPm,LT:

  • fqCPm,LT ≥ DTFCPm,LT implies drought;

  • fqCPm,LT < DTFCPm,LT implies normal conditions.

The release of the climate state is updated on a monthly level, once the seasonal forecasts are released.

The methodology has been developed at single grid-cell. However, the use of multiple cells does not alter significantly the procedure and its reliability.

2.4 Reliability Assessment

The uncertainty associated to the monthly release of the drought assessment is evaluated in terms of classification reliability. Specifically, when a drought is predicted, we want to evaluate how confident we are that the information is not wrong and what the chances are that the methodology is missing alerts or giving false alarms.

The reliability definition builds on the classification procedure described in the previous section and it is assessed in terms of the so-called false rate (FR, ranging from 0 to 100%). FR is a metric that depends on both the system capability of discriminating between the two classes and the reliability of the next future precipitation forecasts. The FR is defined as a function of the actual frequency of ensemble members falling within the dry class (the previously defined fqCPm,LT, hereinafter simply fq whereas DTFCPm,LT hereinafter simply DTF), and depends on the predicted climate state. Specifically, in case of drought, FR(fq) indicates the expected frequency of false alarms (Eq. 3). Conversely FR(fq) indicates the expected frequency of missing drought alerts (Eq. 4).

$$ \left\{\begin{array}{c} FR(fq)=1-\mathrm{s} pecificity(fq)\kern7.5em \mathrm{if}\kern0.75em \mathrm{fq}>\mathrm{DTF}\\ {}\ \\ {} FR(fq)=1-\mathrm{s} ensitivity(fq)\kern7.5em \mathrm{if}\kern0.75em \mathrm{fq}<\mathrm{DTF}\kern4.25em \end{array}\right. $$
(3,4)

Figure 2 depicts how to derive FR(fq) from the sensitivity/specificity curves and over a forecast dataset.

Fig. 2
figure 2

Definition of False Rate, FR(fq) and maxFR based on sensitivity and specificity

Let us assume that the frequency of ‘dry’ ensemble members is fq* ≥ DTF, and so the forecast is attributed to the dry class and the drought state is released (Fig. 2a). SpIn denotes the specificity value associated to fq* (red line). This value indicates that, over the reference dataset, attributing forecast ensembles characterized by the highlighted fq* value to the not-dry class resulted in a correct classification SpIn times over one. On the contrary, the quantity 1-SpIn indicates the cases in which the algorithm wrongly attributed the reference observations to the dry class (false positive, FP) in presence of the same fq* value. Thus, it represents the expected frequency of false alarms.

Conversely, let us assume the frequency of forecast members belonging to the dry clss being fq** < DTF. In this case the forecast get discarded from the dry class (no drought). SnOut denotes the sensitivity value associated to fq** (green line) and indicates that the algorithm correctly attributed the precipitation occurrences to dry class, over the reference dataset, with a frequency equal to SnOut (true positive, TP). The quantity 1-SnOut indicates the cases in which the algorithm wrongly discarded the precipitation occurrences from the dry class (false negative, FN) when they actually belonged to it. Thus, it represents the expected frequency of missing alerts.

Finally, at fq = DTF, FR(fq) reaches its maximum value, which corresponds to the maximum possible rate of false alarms and missing alert (Fig. 2c). This is denoted as maxFR, it characterizes each CPm-LT combination and it is representative of the overall skill of the methodology. Indeed, maxFR provides an assessment of the possible chance of failure of the tool.

3 Case Studies Description

The methodology was developed in the context of an EU’s Horizon 2020 Program Project, named CrossClimate, coordinated by the NEPTUNE Consortium. The key-steps of the procedure have been defined within a co-design process, which was explicitly tailored to small-medium WUs of the Mediterranean region. The aim of the project was to support their water resources management and decisions planning in situations of water scarcity and drought (Amigo s.r.l 2018; Arnone et al. 2020). To this end, two case studies have been conducted to (i) identify and test the key steps of the methodology, and (ii) validate the algorithm at a specific site.

Depending on the climatology characterizing the area and on the dominant water abstraction practices, some WUs may be interested in forecasts with different lead times or in different variables and climatic indices. Therefore, defining the WU needs is a necessary first step.

A first co-design process was conducted with the WU DEYA located in the small island of Zakynthos, Greece (Fig. 3). This allowed to assess the WU needs, mainly in terms of timing of plans, and to practically first applying the algorithm. A further case study was then conducted with the WU Siciliacque S.p.A., located in Sicily, Italy (Fig. 3). This WU is characterized by an integrated type of service, much greater water demand, different water sources and management strategies, but similar climate and criticisms in water shortage and timing of plans.

Fig. 3
figure 3

Case study locations: Zakynthos (Greece) (top) and Sicily (Italy) (bottom). The overlapping with the grid-data of seasonal forecast is showed (on the right)

The following section describes the information collected during the co-design process from both DEYA and Siciliacque. The methodology application is presented for the Greek case, whereas validation and the relative results are discussed for the Siciliacque case.

3.1 Collecting Information

Greece and Sicily (south Italy) are two of the widest areas of the Mediterranean region. Precipitations are abundant in winter and spring, while scarce during summer. In conjunction with high temperatures during summer, this may lead to recurrent conditions of drought. Moreover, relatively minor modifications of the general circulation can lead to substantial changes in the Mediterranean climate (Giorgi and Lionello 2008). This makes the Mediterranean areas a potentially vulnerable region to climatic changes (Lionello et al. 2006; Ulbrich et al. 2006).

DEYA is the Public Water & Wastes Corporation of Greece, and it includes 227 DEYAs, one of which is in Zakynthos, which satisfies mainly domestic water demands for about 40,000 inhabitants (Megalovasilis 2014). The main water source of the island comes from the groundwater. DEYA manages 70 drill holes all over the island.

Siciliacque is a mixed public (25%) and private (75%) company of integrated water service (SII, Servizio Idrico Integrato), i.e. it is a second level water utility. The main aim of Siciliacque is to collect the water from different sources and distribute it to the local WUs. Siciliacque manages the infrastructure of water distribution in a portion of land that includes 1.600.000 inhabitants and focuses mainly on the civil use of water resources. It manages an average of 88.000.000 mc of water in entrance and distributing around 63.300.000 mc of water. The main water sources are surface and underground water from several hydrological basins.

Both the two WUs have indicated a multi-monthly precipitation over 3 months as a common used variable to monitor their resources, given the correlation rainfall-water source recharge. Depending on the specific type of water source, i.e., surface (e.g., dam) or underground (e.g., groundwater), on the size of the hydrographic basins and on the type of soil (which controls the infiltration and redistribution processes), the recharge time of the water source can be even longer, such as 5 months. According to this information, we selected CP3 and CP5.

Connected to the CP, the optimal timing to know the prediction of the climate state is at least 3 months, i.e. having information on seasonal forecast 3 months in advance would be helpful to develop, for example, a new supply distribution strategy. Additionally, winter and spring are the periods when WUs are mostly interested in forecasts, so to have prediction of the upcoming summer period, which is often the most critical period in terms of water shortage for both the island. Such requirements would provide indications for the choice of lead times (LTs). However, the skill of the predictions has to be taken into account, which normally decreases at longer LT depending on the specific location (Arnal et al., 2018). For the validation test, we selected LT0 and LT3 (skills are reported in section 4), which means forecast starting from the current month, and from month 3, respectively. The months of interest to assess the forecast are from January to April and in some cases also December, which broadly correspond to the hydrological year.

Sicily has experienced extreme drought events in 2001 and 2016, as indicated by Siciliacque and reported by Superior Institute of Environmental Research and Protection (ISPRA) and the Sicilian Agrometeorological Information Service (SIAS) (SIAS 2002, 2016; ISPRA 2001, 2016). These two events were selected to validate the methodology.

3.2 Meteorological Dataset and Algorithm Parameters

The climatology has been assessed across 30 years of reanalysis data (1986–2016), as common assumption in literature (WMO 2018). An interpolation procedure has been applied to the SF to make them directly overlapping with the reanalysis.

The coordinates of the cells corresponding to the two case studies are reported in Table 1. All dataset properties used in this study and combinations selected for the validation are summarized in Table 1.

Table 1 Description of dataset properties and combinations

4 Results

4.1 Zakynthos Case

The algorithm is run for all combinations defined in section 2.3, for a total of 21 cases. The DTF is the first variable of interest derived from the calibration. As an example, Fig. 4a shows the DTF resulted for the CP3-LT0 combination, which was indicated by the WU as one of the most interesting for their strategy planning. The error bar obtained from a jack-knife procedure is also reported. Red line indicates tercile threshold 0.33.

Fig. 4
figure 4

DTF (a) and maxFR (b) variation for the CP3-LT0 combination and Zakynthos’ case

Results indicate that the optimal threshold is always higher than 0.33 but for two periods, i.e., jul-aug-spt (07-08-09) and dec-jan-feb (12-01-02), with the latter also showing the greatest uncertainty.

The skills of the forecasts for each rolling 3-months period are depicted in Fig. 4b in terms of maxFR. Over the most critical periods, i.e. the last two trimesters and the beginning of the year (which both correspond to the rainy season of the Mediterranean area), up to jun-jul-aug, skills are very promising, with mean values between 0.25 and around 0.35. This means that the algorithm classifies correctly the predictions for about the 0.65–0.75 of the cases.

As demonstrated by other studies, the skills tend to worsen at increasing LT. Figure 5 provides a compacted overview of the skills for all the possible combinations LT and CPm (skills panel). Specifically, the x axis reports the first month of the cumulation period, from 01 (jan) to 12 (dec), whereas the y axis (or columns) indicates the CPm-LT combination, with increasing LT moving from bottom to top. Each cell reports the maxFR value with an associated color.

Fig. 5
figure 5

Skill panel of the algorithm in classifying correctly the predictions in terms of maxFR. Zakynthos’s case

The bottom-left corner corresponds with the best skills: in January, at LT0, skills are satisfying for all the simulation periods (up to CP6). This means that in January the algorithm would release reliable predictions for a cumulated precipitation up to June, i.e. 6 months in advance. The most reliable combination is CP1-LT0 is in March (maxFR = 0.15). Only 1 month of cumulated precipitation known 1 month in advance could also be useful for the WU, in the case, for example, of very fast water recharge. As highlighted in the previous section, best combinations are not fixed but they depend on the specific case (co-design process). The worst case, as expected, is for the greatest LT in May (maxFR = 0.75). The results of Fig. 5 provide a complete characterization of the algorithm in terms of skill. Based on this simple panel, the end user could decide whether to trust or not the methodology and thus to exploit the information or not.

4.2 Sicily Case: Validation

As a reference, Fig. 6 reports the 3-months cumulated precipitation for the analyzed years 2001 (red markers) and 2016 (blue markers) obtained from the reanalysis, as compared to the 30-years climatology represented by the boxplot. The first three trimesters of both years show values lower than the median (horizontal line in each box), confirming the anomalies from the expected values.

Fig. 6
figure 6

Three-months cumulated precipitation observed in Sicily in 2001 and 2016. Boxplot describes the statistics and variability of the climatology evaluated over 30 years. Reanalysis data from Era-Interim

A skill panel similar to the one obtained for Zakyntos was derived, not here reported for the sake of brevity. The results are similar to the ones presented in the previous section, with January and December having among the best skills. This matches the needs of Siciliacque, which identifies the period from December to April as the most critical one (section 3.2).

When used in operative phase, i.e. based on real-time forecasts, the following information are provided and assessed: date of forecast, period of forecast, drought flag (yes/no), the associated FR (in %), the percentage of members within the dry class (DRY), the DTF value (in %) and the maxFR. While the latter is representative of the skill of the methodology for the selected parameters combination, FR provides the effective reliability associated to that specific forecast.

Table 2 reports the results obtained for Sicily, here shown for the combinations CP3-LT0 and CP-LT3. The analyzed period covers Dec-2000 to Sept-2001 and Dec-2015 to Sept-2016. The last column of the table reports an indication on the anomalies on the observed cumulated precipitation.

Table 2 Results of forecast in terms of drought (yes/no), False Rate (FR), percentage of members within the dry class (DRY), Discriminant Threshold Frequency (DTF), maximum FR (maxFR). Last column indicates whether the observed cumulated precipitation is less than the expected value P50)

The combined evaluation of FR, DRY, DTF provides a solid assessment of the reliability of the forecast, in addition to the maxFR. As an example, in the forecast Dec-2000, LT3, although the low skill (maxFR = 56.25%), the algorithm correctly predicted lower than normal precipitation, with a low probability of error (FR = 8%). Indeed, the DRY percentage is way greater than DTF. Anomalies in precipitation for a period up to 5 months are sufficiently well detected. Anomalies over jan-feb-mar period are also well predicted in Jan-2001, but with a higher FR (equal to 15%), although same DRY and similar maxFR and DTF.

The resulting FR values depend on the shape of the corresponding sensitivity-specificity curves, which are shown in Fig. 7 for the cases CP3-LT3 in mar-apr-may (Fig. 7a) and CP3-LT0 in jan-feb-mar (Fig. 7b). In the former combination, sensitivity curve assumes a steeper shape compared to the latter case, indicating a better capability of discerning droughts events, since the sensitivity moves rapidly towards 1 (Fig. 7a). In the latter combination, FR value corresponding to the same frequency 0.48 (i.e., the percentage of members within the dry class, DRY) is significantly higher because the sensitivity curve indicates lower skill in catching the correct classification within the dry class (gentle slope of the curve).

Fig. 7
figure 7

Sensitivity and specificity curves for the combinations CP3-LT3 in mar-apr-may (a), CP3-LT0 in jan-feb-mar (b) and CP3-LT3 in apr-may-jun (c), which corresponds to the cases Dec-2000, LT3 and Jan-2001, LT0 analyzed in Table 2

In Jan-2001, the algorithm correctly predicted the absence of anomalies for the period apr-may-jun with an associated FR equal to zero; indeed, the specificity value corresponding to the DRY percentage of 16% (i.e. frequency of ensemble equal to 0.16) is 1 (Fig. 7c), which is representative of an straight correct classification of events as belonging to the not-dry class.

Prediction of lower than normal precipitation over the first month of 2016 are also very well predicted even with 3 months in advance (Dec-2015, LT0 and LT3; Jan-2016, LT0 and LT3; Feb-2016, LT0). In Mar-2016, LT0 and LT3 predictions cannot be considered reliable. Indeed, the FR values are high. Similar observations can be made for Apr-2016.

Overall, the results demonstrate the potential of such methodology for planning management strategies. In case of water shortage, the main strategy of Siciliacque is based on the possibility of modifying the supply plan and of withdrawing the water from the reserves where the level is higher. The water distribution network is well interconnected and, thanks to the recent investments made on the infrastructure, it is possible to face water shortages through changing the source of water supply. The management of critical events derives from the trimestral plan for water supply developed by the Environmental Department of the Regional Administration. The technical unit of the WU takes action and develops the new distribution strategies following the guidelines of such plan that gives priorities to the sectors in which water should be used in case of shortage. Therefore, knowing in advance eventual anomalies in the water reservoir recharge would be optimal for establish the enhanced distribution.

5 Conclusion

This study describes a novel methodology for drought predictions thanks to the exploitation of SFs. The methodology aims at assessing the occurrence of anomalies in the cumulated monthly precipitation, compared to the expected values, and defines a drought alert system for the upcoming season, up to 6 months.

The proposed algorithm integrates seasonal hindcasts and reanalysis to assess the overall skill of the classification system (maximum false rate) and to estimate reliabilities associated to the real-time drought predictions (false rate). Computation is applied for selected target and cumulation period, i.e. months of the year and time window over which precipitation is cumulated. Two case studies are carried out to test and validate the procedure across two areas of the Mediterranean region, which is frequently hit by extreme droughts and which is particularly vulnerable to consequences of climate change.

Results and highlights are summarized as follows:

  • the algorithm does not necessarily require bias correction, since it is based on a comparative assessment of statistics;

  • skill of the predictions depends on the selected target period, cumulation period and lead times, in agreement with previous studies that used SFs (e.g. Doblas-Reyes 2012; Arnal et al. 2018, Marcos et al., 2017). Up to 5 months of cumulation period, months ranging from November to April showed the best skill at lead time zero. This period matches very well with the needs of the WUs, here identified as possible users of the methodology.

  • Summer months (e.g. June, July, August) showed low rates of general skill, especially at long lead times; the result agrees with the findings obtained in Sicily by Bonaccorso et al. (2015), who found out that during winter-spring season the precipitation series (in terms of SPI index) exhibits the best correlation with the climate predictor variable NAO and that the influence of NAO becomes less influent as the lead time increases. However, this might not constitute an issue, since WUs are interested in predictions for the rainy months.

  • Ultimately, the reliability of the forecast in real-time depends on the effective frequency of the ensemble members that fall within the dry class; therefore, in some cases, even low skill could provide reliable prediction.

  • The application to the Sicilian case demonstrated that the method correctly predicts the drought 3 months in advance.

The methodology enables the user to exploit or not the drought alert, based on an acceptable risk. Concluding, its formulation is particularly flexible. Forecasts of precipitation anomalies months in advance may be crucial for understating possible delays in water resources recharge, and thus may be strategic for the water supply management.

Further development of the methodology includes the analysis at multiple grid-cells and, if required, a procedure of downscaling.