Hydrological drought forecasts outperform meteorological drought forecasts

One of the most effective strategies to reduce the impacts of drought is by issuing a timely and targeted warning from month to seasons ahead to end users. Yet to accurately forecast the drought hazard on a sub-seasonal to seasonal time scale remains a challenge, and usually, meteorological drought is forecasted instead of hydrological drought, although the latter is more relevant for several impacted sectors. Therefore, we evaluate the hydro-meteorological drought forecast skill for the pan-European region using categorical drought classification method. The results show that the hydrological drought forecasts outperform the meteorological drought forecasts. Hydrological drought forecasts even show predictive power (area with perfect prediction > 50%) beyond two months ahead. Our study also concludes that dynamical forecasts, derived from seasonal forecasts, have higher predictability than ensemble streamflow predictions. The results suggest that further development of seasonal hydrological drought forecasting systems are beneficial, particularly important in the context of global warming, where drought hazard will become more frequent and severe in multiple regions in the world.


Introduction
Drought is one of the most severe weather-related natural hazards, which causes damage and losses comparable to other destructive hazards, such as floods, landslides, and earthquakes [1]. To reduce impacts of weather-related drought hazards, the development of drought Early Warning System (EWS) is of utmost important. However, drought EWS, or drought modules within a multi-hazard EWS, which typically have a sub-seasonal to seasonal forecast time scale because of the creeping nature of this hazard, are still less developed than many other EWS modules of short time-scale hazards. Weather forecasts as the main component of the warning system have in the past lacked the skill to produce reliable forecasts for longer periods than days and weeks [2,3]. In the last decade, dynamical seasonal forecast systems based on numerical prediction models have become more skillful. These models generally have shown greater skill than statistical models [4]. Improvement of the probabilistic seasonal forecasts, therefore, has great potential for seasonal drought forecasts. As a result, drought EWS modules, including a seasonal forecasting component, have been developed in some regions, such as the US, Europe, and Africa [5][6][7][8][9].
Following the improvement of the dynamical seasonal forecasts, many studies have been conducted to assess the skill of the hydro-meteorological forecasts, which are e.g. precipitation, soil moisture, and streamflow forecasts [10][11][12][13][14]. Please keep in mind that generation of hydrological time series cannot be classified as drought forecasts in the narrower sense. These are known as hydro-meteorological forecasts. Hydrological drought forecasts require, similar to hydro-meteorological forecasts as a starting point of the chain, a state-of-the-art large-scale hydrological model fed by a probabilistic or deterministic forecast [15]. However, an additional step in the chain, which is applying a drought identification method using the forecasted time series of hydro-meteorological variables, must be carried out [9]. The skill of the drought prediction is therefore highly reliant on the reliability of the meteorological forecast, the hydrological model to realistically simulate the water cycle over a large-scale region [16], and the drought identification method (either standardized or thresholdbased approaches, [17]). In terms of seasonal hydrological drought forecasts, there are only few studies that assessed the skill of the forecasts and compared them with meteorological drought forecasts [18][19][20][21][22]. However, they do not include all necessary hydrological variables. For example, reference [18] compared the forecasted meteorological drought (i.e. the Standardized Precipitation Index, SPI) with the forecasted soil moisture derived from the Variable Infiltration Capacity (VIC) model, reference [19] assessed the skill of precipitation and soil moisture forecasts derived from the VIC model, and reference [22] compared the forecasted SPI with the Standardized Streamflow Index (SSI) that was calculated using the VIC model. Moreover the spatial resolutions of 0.25 • (~30 km) used in the previous studies is too coarse for most of the impacted sectors. This study aims to overcome these critical issues. In a comprehensive manner, we investigated drought severity classes for hydrological variables runoff and groundwater for different seasons, and at a detailed spatial scale (5 km) coming closer to the needs for impact assessment by end-users [23]. Providing drought forecasting scores in a full set of hydrometeorological variables, e.g. precipitation, precipitation minus evaporation, runoff, and groundwater, which are derived from the recently established ANYWHERE Drought EWS (AD-EWS, [9]), is a step forward of this study compared to aforementioned studies that mainly focus on meteorological drought forecasts, hydrological forecasts, and/or hydrological drought forecasts for only one variable. This allowed us to assess whether hydrological drought forecasts perform better than meteorological ones, which is the main focus and novelty of this study. Moreover, we investigated if drought forecast scores obtained from the dynamical forecast have more skill than those derived from the Ensemble Streamflow Prediction (ESP).

Data
All the hydrological variables used in this pan-European study were simulated using the LISFLOOD hydrological model [24,25] fed by: 1) the ECMWF SEAS re-forecast S4 downscaled to 5 × 5 km (680 × 810 grid cells, longitude × latitude with in total 259,023 land cells), 2) 20 years of resampled historical meteorological observations interpolated to 5 × 5 km grid cells, and 3) gridded meteorological observations, precipitation and evapotranspiration from 1990 to 2017, interpolated to 5 × 5 km grid cells. Potential evaporation and evaporation rates (including transpiration) are calculated through the offline LISVAP pre-processor based on the Penman-Monteith equation [26]. The model was calibrated using time series of observed river discharge from over 200 catchments across Europe. A number of parameters were tuned that control snowmelt, overland flow, river flow, infiltration, residence times in the soil and subsurface [27,28]. LIS-FLOOD obtained a median NSE of 0.57 over the validation period. Moreover, their studies have demonstrated that LISFLOOD is able to simulate streamflow drought.
The LISFLOOD output using the re-forecasted ECMWF SEAS4 (number 1) is known as hindcast and resample historical meteorological data (number 2) is referred as the Ensemble Streamflow Prediction (ESP) [29]. The ESP was included in this study to explore if dynamical seasonal forecast provides added value. The model outputs run using the gridded meteorological observations (known as LISFLOOD-Simulation Forced with Observations, SFO) (number 3) were used as a proxy for observed hydrological variables. These variables are total runoff, which consists of surface and sub-surface runoff, and groundwater at the upper layer. The variables obtained from the LIS-FLOOD seasonal re-forecast, ESP, and LISFLOOD-SFO simulation from 2002 to 2008 were used to identify the drought events for re-forecast, ESP, and proxy observed time series, respectively. We used the median of 15 and 20 ensemble members of reforecast and ESP data, respectively, with a lead-time of 5 months. We had to use the proxy hydrological data because the in situ observational data for runoff and groundwater at the pan-European level are not available. However, the use of SFO data as reference for observed is acceptable as a common practice in the forecast evaluation studies [4,14,18,20,22,[30][31][32][33]. Detailed information about ECMWF re-forecasts, ESP, and the LISFLOOD hydrological model can be found in references [9,33,34]. The flowchart that summarizes the data and methods section is presented in figure S1 in the supplementary material (stacks.iop.org/ERL/15/084010/mmedia).

Meteorological and hydrological drought indices
In this study, we applied the standardized approaches for the assessment of hydro-meteorological droughts. The Standardized Precipitation Index (SPI, [35]) and the Standardized Precipitation Evaporation Index (SPEI, [36]) were used to represent meteorological drought, while the Standardized Runoff Index (SRI, [37]) and the Standardized Groundwater Index (SGI, [38]) were used to represent hydrological drought. Here we explain the SPI concept only because the SPEI, SRI, and SGI were calculated using the same concept as SPI. The SPI is designed to quantify a precipitation anomaly (for both dry and wet conditions) at different time scales (e.g. accumulation from 1 month up to 12 months, SPI-x, x = 1, 2, ..., 12) for any grid-cell or site [35]. The SPI calculation for any grid cell is based on a long-term observed precipitation record that is fitted to a probability distribution (e.g. Gamma), which is then transformed into a normal distribution. Thus, the median SPI for the grid cell and the selected accumulation period is zero. Positive SPI values indicate greater than median precipitation and negative values indicate less than median precipitation. A drought event is assumed to occur when the SPI is below zero and ends when the SPI becomes positive. In this study, we determined the SPI using accumulation periods of 1, 3, 6, and 12 months (i.e. SPI-1, SPI-3,…, SPI-12).
We used the gamma distributions obtained from the proxy observed data for each index (except SPEI), month, and grid cell (input dataset 3, see above) to calculate the re-forecasted drought indices (SPI, SRI, and SGI) (3 indices, 12 months, 259,023 land cells). The three-parameter log-logistic distribution was used to calculate the re-forecasted SPEI (1 index, 12 months, 259,023 land cells). The gamma distribution has quite a flexible shape parameter, which is applicable to the wide range of accumulated precipitation in Europe [39]. A study by reference [40] also shows that the gamma distribution can be used for hydrological forecasting of both high and low flows. The SPI-x for the re-forecasts (up to 7 months ahead) was calculated by considering data from monthly reforecasts and observational data from the preceding months if needed for shorter lead-times [41]. For example, to calculate the SPI-6 for the first month, 5 months of observed data are accumulated with the first month of forecasted data. To calculate the SPI-6 for the second month, 4 months of observed data are accumulated with the first 2 months of forecasted data and so on up to seventh month of the forecast lead-times [9,18,41]. This implies that shorter leadtimes and longer accumulation periods have a higher proportion of observed data.

Drought class and forecasting score
In this paper, we introduce a new technique, i.e. a categorical drought classification, to analyze the score of the forecasts by comparing the re-forecasts or ESP with the proxy observed. This simple technique provides drought forecasting scores based on differences in drought classes that are familiar and widely used by end users [42], including water managers and politicians. The scores are easy to understand by end-users [23,43]. We calculated the scores for each of the forecasted hydro-meteorological drought (precipitation, precipitation minus evaporation, runoff, and groundwater). We used drought classes to describe the severity of drought in the various hydrometeorological variables and to determine the forecasting score. The SPI drought severity classes are as follows: mild drought for 0> SPI≥ -0.99, moderate drought for 1.00≥ SPI≥-1.49, severe drought for -1.5≥ SPI≥ -1.99, and extreme drought for SPI≤ -2.00 [35]. For the SPEI, SRI, and SGI, we used identical classes [36][37][38].
We assigned a number to each drought severity class, as follows: 1: no drought, 2: mild drought, 3: moderate drought, 4: severe drought, and 5: extreme drought. The drought forecast score was determined by computing the difference of drought class derived from the median of the ensemble of reforecasted data and the number obtained from the proxy observed data. A similar procedure was followed for the ESP forecast score. A perfect forecast is achieved if the score is zero (white color), meaning that there is no difference between both drought classes. A positive score (bluish colors e.g. in figures 1(c) and (e)) indicates over-forecasting and a negative score (reddish colors e.g. in figures 1(c) and (e)) denotes under-forecasting. We applied bluish color for over-forecasting and reddish color for under-forecasting because the latter involves more risk for water manager dealing with drought planning than over-forecasting the hazard. The percentage area of each drought forecast score was calculated by summing up all land cells which have the same class difference divided by the total number of land grid cells, multiplied by 100 percent. We averaged the three monthly percentages of drought forecast scores for winter (DJF), spring (MAM), summer (JJA), and autumn (SON) for seasonal analysis.
Three different color scales (figures 2 and 4), green, brown, and red, were applied to classify the percentage of Europe (% of cells) in a certain drought forecast class, or group of drought forecast classes for a particular accumulation period (e.g. SRI-1). The 25th, 50th and 75th percentile were used as class limits. These were applied for all forecast scores (class difference: none, plus, and minus) derived from all seasons (n = 4) and lead-times (n = 5). For perfect forecasts (class difference: none), the 25% largest European areas (above 75th percentile), with a perfect forecast across the 20 cases (4 seasons × 5 months lead-times), have assigned the green color, whereas the 25% smallest areas obtained the red color (below 25th percentile). The two intermediate classes have gradient colors from green to brown (from 75th to 50th percentile) and from brown to red colors (from 50th to 2th percentile). The colors are reversed for the group with forecast scores +4 to -4, reflecting that small areas with certain drought class differences represent a higher predictive power than large areas. Furthermore, we calculated the percentiles from the percentage of areas (n = 20), which have the same absolute class (e.g. class -1 is grouped with +1, class -2 is grouped with +2, and onwards). Please see reference [9] for detailed information on the drought forecast score and color coding.

Hydrological drought forecasting scores
An example of forecasted drought in runoff in the summer of August 2003 is presented in figure 1. The 2003 drought was selected because it is known as one of the most recent severe pan-European droughts [44]. The forecasted drought in the pan European region for 1-month lead-time (LT), in general, is in good agreement with the SFO (figures 1(a) and (b)). Both the forecast and SFO runoff show in a large area mild drought and in small area moderate droughts in central Europe. North UK and Ireland were forecasted to have mild drought, however, moderate and extreme drought were observed. The comparison between forecasted drought and SFO shows that the forecast over-estimates the drought class in northeast Europe and Portugal, and vice versa for a large part of central Europe and the UK (figures 1(c)). The forecasts using ESP also produces a similar drought class and area than the forecast, with a higher drought class in east Europe and east Russian Federation ( figure  1(d)). Moreover, drought severity prediction done by the ESP has higher (lower) drought classes indicated by more dark bluish (reddish) colors than the forecast (figures 1(c),(e)), i.e. over-estimation.
A comprehensive comparison of hydrological drought forecasting scores in runoff and groundwater is presented in figure 2 and S2, respectively. Figure 2 shows that forecasting of hydrological droughts one month ahead (LT = 1) attains high scores, with values of perfect prediction (none: no class difference between SFO and forecast) in general around 70% for SRI-1 and close to 90% for SRI-6 and above. For longer lead-times this percentage is around 50% for SRI-1 and SRI-3 (for SRI-6 and SRI-12 percentages are higher). SGI shows similar scores as SRI-1 (figure S2). During spring, the score of hydrological drought forecasts is lower than in other seasons; it goes down close to 60% (SRI-1 and SGI-1, LT = 1). The performance of the forecasts improves with accumulation periods. The SRI-12 (LT = 1) has the highest performance with perfect forecasts above 90% for all seasons. This is plausible since the observational data (i.e. SFO) were added in the drought analysis for higher accumulation periods (see Data and Method section). Figure 2 also shows that the hydrological drought forecasts generally over-estimate the drought severity, but mainly by not more than 2 classes, indicated by higher percentage of mismatch between forecast and SFO for class differences +1 and +2 (green dominant in -1 and -2) relative to negative ones (red dominant in +1 and +2). For example, for runoff drought (SRI-1) with LT = 1 month in the summer season, 11.5-14.6% of the pan-European area has a forecasted drought class that is 1-2 classes higher (more severe) than SFO, whereas in 0.1-0.2% of the area the forecasted drought class is one to two levels lower (less severe) than derived from the SFO. The general slight over-estimation of hydrological drought forecasts may be caused by precipitation and temperature biases produced by the forecasts rather than by the hydrological model since we have used the same model for forecasts and proxy for observations.
Hydrological drought prediction using ESP shows slightly lower drought forecasting scores, with the SRI-1 showing around 2-3% smaller areas with perfect forecasts than the probabilistic dynamical forecasts for LT = 1 (figure S3 and S4 for SRI and SGI, respectively). This does not apply to forecasts issued in spring since, in this season, the ESP has a slightly larger area with a perfect score. For higher accumulation periods than 1 month (e.g. SRI-3), the difference in areas with 1-month accumulation period becomes smaller due to an integration of proxy observational data in both forecasts.
The relatively lower predictive power of hydrological drought forecasts issued in spring may relate to the timing of snow melting associated with biases in temperature prediction [45]. Forecasting of too early snow melting generates more runoff in the early spring resulting in a lower runoff, and hence a more severe drought by the end of the spring season (i.e. warm snow season drought). On the contrary, forecasting of too late snowmelt leads to a drought in the early spring since less runoff is generated due to frost conditions during winter and early spring (cold snow season drought) [46]. The mismatch on the prediction of too early or late snow melting in spring (especially in March) may be due to an oversimplification of the snowmelt module and coarse elevation data used in the LISFLOOD model, resulting in a bias in the temperature-dependent simulation of snowfall and snowmelt [45]. The hydrological drought forecasts for March 2003 with LT 1-month show mild to moderate drought for almost all regions in north Europe, while the SFO only shows mild drought in a small part of north Europe (not shown).

Meteorological drought forecasting scores
Meteorological drought forecasting represented by drought in precipitation using the SPI-1 in August 2003 is given in figure 3. Both the probabilistic dynamical forecast and the ESP produce different pattern of drought-affected regions for the summer season and for LT 1-month compared to observed SPI-1 (figures 3(a), (b) and (d)). The probabilistic dynamical drought forecast tends to produce lesser severe droughts (lower drought classes shown by more reddish colors) in most of European regions than the observed ( figure 3(c)). The deficiency of the dynamical forecast model to produce more extreme droughts can be recognized particularly in the UK, west EU, central EU, and east EU, with class differences up to -4 (reddish colors, figure 1(e)). This deficiency is then translated to hydrological drought since we used the forecasted meteorological data to run the LISFLOOD model as shown by the corresponding locations of lower and higher forecasted drought classes in Europe for the two forecasts (figures 1(c) and 3(c)). The forecast based upon the ESP also produces drought class lower than the observed up to -4 ( figure 3(e)). The drought forecasts for northern European regions mostly overestimate the drought class, indicated by bluish colors in figures 3(c) and (e). The ESP forecasts even show more widespread bluish color in north EU up to west Russian Federation. Figure 4 clearly shows the inadequacy of the probabilistic dynamical meteorological forecasts to produce the same drought classes as derived from observed precipitation. This is indicated by rather small pan-European areas with percentages of perfect predictions, e.g. 45-70% for SPI-1 and SPI-3, as well as for SPEI-1 and SPEI-3 ( figure S5). Compared to hydrological drought forecasts (figure 2), meteorological drought forecasts produce higher class differences up to +4, meaning lower predictive power, except for higher accumulation periods, such as SPI-6 and SPI-12. SPI-6 can forecast the drought area reasonably well (over 60%) with an LT up to 2-month in all seasons. However, a lot of observational data are included in the SPI-6 with lead-times of 1 and 2 months (see the Data and Method section). Clearly, SPI-12, which has the longest accumulation period, produces the highest forecasting score with drought class mismatch only up to 2-class difference. However, at least 5 months of observations are included in the SPI-12 (LT = 7 months).
The highest forecasting score of meteorological drought for long accumulation periods (e.g. SPI-6 and SPI-12), as expected, is achieved for winter, followed by autumn and spring (figure 4) [32,33]. Summer is the season that has the lowest score compared to others for accumulation periods longer than 1 month. The higher score of meteorological drought forecasting for winter than summer could be due to better precipitation predictions with the forecast model, in particular, as a response to the North Atlantic Oscillation (NAO), as one of the strongest predictors in seasonal forecasts in Europe [47,48]. The low skill for meteorological drought forecasts in the summer might be due to the intense precipitation events (more convective type) and evapotranspiration, which challenge accurate weather prediction in summer [33,[49][50][51].
SPI drought prediction using ESP shows even lower scores than using the dynamical forecasts (figure S6). For SPI-1 with a lead-time of 1 month, the ESP only yields perfect prediction in less than 40% of the pan-European area, which is >10% lower than the probabilistic dynamical forecast. As expected, the difference between ESP and dynamical forecasts becomes less for higher accumulation periods.

The effect of memory on the forecasting score
The comparison between hydrological and meteorological drought forecasts clearly shows the higher predictive power of hydrological drought forecasts, even beyond 2 months (area with perfect prediction >50%). This is plausible since hydrological variables used here are affected by land surface water storage (e.g. soils, groundwater) that pools, attenuates, lengthens and delays the effect of the driving forces (i.e. precipitation) as reported in several studies [17,[52][53][54][55]). Some previous studies confirmed that catchment control could be as important as climate control [56,57]. The skill of SRI and SGI is quite comparable because we used the total runoff data, which includes both the surface and sub-surface runoff. Sub surface runoff has more catchment memory reflecting storage processes in soil and groundwater than the surface runoff, which is not strongly related to precipitation. Groundwater data used in this study is groundwater storage from the upper domain (Data section).
Meteorological drought forecasts for longer accumulation periods than 1 month and short lead-times also contain memory in the sense that these contain observed meteorological data. However, even when monthly meteorological data are accumulated over periods of several months (e.g. SPI-3 and SPEI-3), the predictive power of hydrological drought forecasts is higher due to catchment memory (figure S7). The score of meteorological drought forecasts improves with the increase of the accumulation periods of the SPI and SPEI [41] because of a higher proportion of observed data, which artificially inflates forecast scores. Clearly, SPI-12 and SPEI-12 with an LT of 1 month, which include 11 months of observed data, produce the highest score (> 80%, figures 4 and S5, respectively). The score for meteorological drought indices for longer accumulation periods should not be misinterpreted. Our findings show that meteorological drought forecasts taking account of memory through inclusion of past observational data (accumulation periods) might be an alternative for hydrological drought forecast if hydrological drought forecasts are not available. For instance, the forecast score of SPI-3 and SPEI-3 are marginally comparable to forecasted drought in runoff and groundwater with 1-month LT (SRI-1 and SGI-1, respectively), which contain no preceding SFO data (figure S7).

Conclusions and future improvement
This research shows the strengths of hydrological drought forecasts to predict drought in runoff and groundwater from one month up to several months ahead, which outperforms meteorological drought forecast, and complements conventional hydrological forecasts (e.g. streamflow). This opens an opportunity for water managers and stakeholders dependent on water resources planning and management to rely more on hydrological drought forecasts than solely on meteorological forecasts. Our findings also highlight the importance of memory in hydro-meteorological drought forecasts, e.g. land surface water storage. The highest score can be achieved using hydrological variables, such as runoff and groundwater, or alternatively if no hydrological data are available, using meteorological variables (e.g. precipitation, precipitation minus evaporation) for longer accumulation periods that include observed data.
In this research, we used the LISFLOOD hydrological model fed by the ECMWF forecast system SEAS-4 to produce seasonal hydrological drought forecasts. The use of multi-model ensemble seasonal forecasting system (climatic and hydrological) may increase the skill of drought forecasts [14,58] although the improvement may not be very significant compared to a single model that has high predictive skill [31,59]. The LISFLOOD model that we used in our study was selected for operational flood and drought forecasting by Joint Research Center (JRC) by considering many factors in the model selection, including e.g. model's uncertainty, cost of implementation, and feasibility of technical implementation [60,61].
We compared our drought index forecasts derived from the LISFLOOD model fed by the probabilistic ECMWF SEAS-4 with forecasts based upon the Ensemble Streamflow Prediction (ESP). We found that in most cases the hydrological drought forecasts by the ECMWF SEAS-4 are slightly better than the ESP. However, the ESP shows a big deficiency in predicting meteorological drought for an LT of 1 month. Previous studies using the ECMWF SEAS-4 to feed hydrological models also conclude that the ECMWF SEAS-4 is, in general, more skillful than the ESP for certain seasons and lead-times [20,33]. Therefore we conclude that the ESP could be used to forecast hydrological droughts, if dynamical forecasts are not available.
The skill of hydrological drought forecasts may even improve with the replacement of the ECMWF SEAS-4 with the newest forecasting system SEAS-5 and the improvement of the LISFLOOD model that started in 2019 [62]. We also note that the evaluation of drought forecasts was performed against SFO data as a benchmark for observed, because gridded hydrological variables across Europe are not available. The skill of hydrological drought forecasts might decrease if we replace the SFO data with in-situ observation due to e.g. model uncertainty. This experiment was discussed in reference [34] in which they used re-forecasted river discharge that was compared with discharge derived from SFO and gauged data. They concluded that the skill of hydrological drought forecasts is still higher than the SPI obtained from 3 months accumulated precipitation in the Guardiola catchment. Thus, changing the SRI derived from the SFO data with the SRI obtained from in-situ observed data will not change our conclusion that hydrological drought forecasts outperform meteorological ones. If drought events will become more frequent and severe in the twenty-first century due to global warming [63,64], the development of skillful hydrological drought forecasting system will be of the utmost importance.

Data availability
The EFAS data are accessible under a COPERNICUS open data license (https://doi.org/10.24381/cds.e345 8969). In this study we used EFAS system version 2. The drought indices analyses using dynamical