Drought impact prediction across time and space: limits and potentials of text reports

Drought impact prediction can improve early warning and thus preparedness for droughts. Across Europe drought has and will continue to affect environment, society and economy with increasingly costly damages. Impact models are challenged by a lack of data, wherefore reported impacts archived in established inventories may serve as proxy for missing quantitative data. This study develops drought impact models based on the Alpine Drought Impact report Inventory (EDIIALPS) to evaluate the potential to predict impact occurrences. As predictors, the models use drought indices from the Alpine Drought Observatory and geographic variables to account for spatial variation in this mountainous study region. We implemented regression and random forest (RF) models and tested their potential (1) to predict impact occurrence in other regions, e.g. regions without data, and (2) to forecast impacts, e.g. for drought events near real-time. Both models show skill in predicting impacts for regions similar to training data and for time periods that have been extremely dry. Logistic regression outperforms RF models when predicting to very different conditions. Impacts are predicted best in summer and autumn, both also characterised by most reported impacts and therefore highlighting the relevance to accurately predict impacts during these seasons in order to improve preparedness. The model experiments presented reveal how impact-based drought prediction can be approached and complement index-based early warning of drought.


Introduction
Impact-based prediction of natural hazards offers new possibilities for emergency management and adaptation strategies for decision-makers and practitioners [1]. Currently, early warning systems typically predict the magnitude, location and timing of potentially damaging hazard events, but rarely provide impact estimates, such as expected physical damage, affected people, disruption of service or financial loss [2]. Predictions (or forecasts) including such impact or damage data may increase willingness to take protective action and have received broad support from a communication perspective [3,4].
The methodologies for existing impact-based predictions vary considerably across natural hazards. For example for earthquakes, operational impact forecasting systems are implemented, while for floods only prototypes exist [2]. For drought, several efforts have been made. For Africa an operational system focusing on famines and life loss caused by drought has been developed [5]. Whereas, for the US no impact-based drought prediction is in place, despite extremely damaging drought events, in particular on agriculture [6]. For Europe there is a limited number of scientific studies that incorporated drought impacts into modelling approaches [2]. However, the increasing potential of severe drought impacts in near and far future has been repeatedly highlighted [7][8][9] and thus should be integrated by early warning systems.
Drought impact models are complicated by the multifaceted nature of drought [10]. As there is no consensus on drought definition, it is difficult to determine onset and termination of drought. Typically, drought indices monitor the hazard's characteristics and define the abnormal dry conditions. Reduced water availability builds up slowly, accumulates over time and can persist over several years, hence drought has been referred to as 'creeping' disaster [11]. This challenges the identification of drought impacts, especially those with indirect causality and confounding factors, e.g. whether tree mortality is caused by drought, pest diseases or heatwaves is difficult to disentangle. Furthermore, drought impacts may linger for years after the actual drought has ceased [12]. The variety of affected sectors is large and their different response times to dry conditions are difficult to integrate in modelling, e.g. different crop or forest types or water supply systems [13]. Thus, compared to other hazards with more direct impacts, drought impacts are hard to identify, also in monetary terms [14], resulting in limited impact data.
In consequence, a main challenge of impact-based prediction is the lack of quantitative drought impact data. Efforts that have been made to estimate impacts predominantly focus on the agricultural sector [14,15]. Additionally, text-based drought impact databases or inventories received scientific interest, as they not only archive past impacts, but also structure them specifically across various affected sectors. Most commonly used inventories are the US Drought Impact Reporter by the National Drought Mitigation Center [16] and the European Drought Impact report Inventory (EDII) by [17]. Recently, the Alpine Drought Impact report Inventory (EDII ALPS ) was published, presenting the European Alps' and their foothills as vulnerable to drought [18]. These inventories define impact as a negative consequence of the drought hazard on environment, society and economy and demonstrate that drought can have multifaceted negative consequences, with most impacts on agriculture, water supply and quality and forestry. Several studies used the reported impacts archived in the EDII for impact modelling [13,[19][20][21][22]. They modelled the statistical link between impacts and drought indices capturing drought conditions. All of them analysed a given spatial unit separately and claimed that model performance and predictor selection was region-specific. Acknowledged results from these studies are (1) maps of the individually modelled regions with their specific impact probability, and (2) recommendations to select indices for different impact types and regions. According to our knowledge, only one study tested the potential to predict drought impacts with a lead time of several months [23]. Since these predictions were tested in the past, we call that 're-forecast' . None of the studies tested the models performance to predict drought impacts in other regions or into the far future.
The study's aim is to take drought impact modelling one step further towards prediction beyond the training data environment. Therefore, the study builds on prior experience of linking reported impacts with suitable drought indices. In particular, we move from unit-specific 'local' models to a regional model integrating spatial components. As the EDII ALPS is the most recent and spatially complete source for drought impacts across a heterogeneous terrain, we use this database to test the models' spatial transferability. Specifically, we test and evaluate the results of two commonly used models asking the following questions: (i) How well do predicted impact estimates reflect reported drought impacts in other regions? (ii) How well do the models re-forecast drought impacts?

Study region and drought impact data
Europe's so-called 'Alpine Space' [24] covers the Main Alpine Crest and the Alps' foothills ( figure 1(a)). The heterogeneous terrain includes high peaks up to 4848 m asl, deep valleys, and steep slopes of all aspects, as well as plateau areas and lowlands. In this region agriculture is characterised by a long tradition of pasture farming on higher elevation regions, in addition to the cultivation of crops in the valleys. Forests serving as protection against landslides and avalanches cover almost a third of the area with a higher tree line in the Southern region. The population is heterogeneously distributed with large cities in low elevation areas and small villages at higher altitudes. The annual precipitation ranges from 400 to beyond 3000 mm year −1 [26] and the European Alps contribute large amount of water to the four major European rivers Danube, Po, Rhine and Rhone [27]. Due to these water-rich characteristics the Alps are recognised as Europe's Water Towers [28]. Despite the humid mountain climate, water scarcity is a serious issue in some parts of the area during certain periods, e.g. when the tourism season increases the water demand [29]. Across the Alpine Space, drought impacts were reported particularly during the warm and dry years of 2003, 2015, and 2018 [18]. The Alpine Drought Impact report Inventory (EDII ALPS ) archives these impact reports based on the text-coding approach of the EDII [17]. Currently, the database stores more than 3200 drought impacts classified into 13 categories and 96 subtypes with most impacts localised at least to NUTS 3 regions (EDII ALPS V1.1, doi: 10.6094/ UNIFR/230219). NUTS 3 is a spatial unit based on the nomenclature of territorial units for statistics used to subdivide countries into minor regions for statistical purposes [30]. The Alpine Space and thus the EDII ALPS covers 188 NUTS 3 regions.
For the model experiments, we defined two subregional pairs within the Alpine Space according to [18]: (1) the Northern and Southern region (NorthernR and SouthernR), and (2) the highaltitude and pre-Alpine region (HighAltR and PreAlpR, figure 1(b)). For each of these four subregions, we extracted a monthly time-series of reported impacts at least to a NUTS 3 level (in the following called NUTS regions). The timespan considered is between 2001 and 2020, and the impacts considered needed at least information about the start of occurrence with a specific year and season or month. If information about the end was unavailable, the impact was assumed to only have occurred in that given month or season. Then, the reported impacts were classified into impacts likely triggered by soilmoisture drought (D SM impacts) and/or hydrological drought (D H impacts) proposed by [18]. Most D SM impacts relate to reports about reduced crop production and decreased plant vitality, and most D H impacts relate to reports about limited water supply including irrigation and reduced water quality. Thus, with these two impact groups we aim to predict a group of drought impacts that we assume are caused by similar drought characteristics (for further we refer to [18]). Impacts assigned to a specific month and NUTS region were translated into 'impact occurrence' , while the remaining periods were labelled as 'no impact occurrence' , converting the qualitative information of reported impacts into binary occurrence/non-occurrence data, similar to [31]. Subsequently, two binary time-series with a monthly resolution focusing on two drought type conditions were obtained: occurrences of D SM impacts and of D H impacts. The sum of occurrences varies strongly between both timeseries and between NUTS regions (N s , figure 1(c)).

Predictor variables
As monthly predictors, we selected eight drought hazard indices in order to represent different drought characteristics captured by the monitoring of meteorological, hydrological and remote sensing data (table 1): The Standardised Precipitation Index (SPI) [34] and the Standardised Precipitation Evapotranspiration Index (SPEI) [35], the Soil Moisture Index [36] of the top soil layer (0-7 cm into the ground, SMI-1) and of the subsoil layer (7-28 cm into the ground, SMI-2), the Vegetation Condition Index (VCI) [37] and the Vegetation Health Index (VHI) [38]. For the SPI and SPEI we selected accumulation periods of 3 months (SPI-3, SPEI-3) and 6 months (SPI-6, SPEI-6) in order to account for precipitation that initially fell as snow and was afterwards released with warmer temperatures [39]. In addition to the snow accumulation effect, winter typically resets the hydrological storage system every year, wherefore we did not consider longer accumulation periods. The indices were derived from the Alpine Drought Observatory [40], which monitors drought across the Alpine Space (https://ado.eurac. edu/), and were then aggregated to monthly means for each NUTS region. Four geographic variables for each NUTS region served to describe the geographical variation (figure 1(d) and table 1): forest area (forest), agricultural used land (agric), elevation (elev) and population density (popd). The predictors forest, agric and popd represent sensitivity factors of vulnerability and have often been used in drought risk analyses [41][42][43]. Elev represents the topography to integrate one of the most prominent features of the study region. The predictors forest and agric built on the most recent Copernicus land cover data (CLC) [25], as area of any forest type respectively and any type of agricultural land, respectively, in each NUTS region. Elevation (elev) represented the median elevation based on the digital elevation model provided by the European Environment Agency (EEA) [44] and the mean population density (popd) stemmed from Eurostat [30].
The predictor 'season' has been used by several studies [13,23,45] in order to account for the clear seasonality of drought impact occurrence. The reporting variable 'c weight ' addressed the known differences across the countries within the Alpine Space with the sum of all D SM and all D H impact occurrences in Germany, Austria, Switzerland, France, Italy and Slovenia.

Model types, tested experiments and evaluation
We applied two different model types to ensure that our results are not an artefact of a specific approach (figure 2(a)): Logistic regression (LR) and random forest (RF), both frequently applied to binary occurrence data and also when modelling with reported drought impacts [22,23,[47][48][49]. The model experiments predicting drought impacts included all data for which all predictors were available at monthly resolution, e.g. cloud cover prevented the calculation of the VCI and VHI for some regions during certain time periods. Regarding LR, we log-transformed the predictors popd, forest and agric to achieve a more even coverage of predictor space.
Testing the predictive power and transferability of the models, we applied two model experiments in space and one in time ( figure 2(b)). The predictions in space evaluated to what degree the models can predict impact occurrences in other regions, e.g. to those omitted from training data. With two spatial scales, the models potential to predict in space was assessed: 'S within ' considered each subregion individually and thus is applied to more similar conditions. S within predicted impact occurrences following the leave-oneout method, predicting the impact occurrence in one NUTS region with a trained model by the data of all remaining NUTS regions, a repeating process until all NUTS regions in the considered subregion are predicted. 'S other ' tested the prediction to contrasting climatic and altitudinal conditions. Therefore, S other trained the models with the data from one subregion and predicted the impacts in all NUTS regions of the other subregion.
The experiment 'T month ' predicted impacts of the next month with all data before, simulating a reforecast. This way, the training dataset increased from month to month over the whole time period. The first predicted month was January 2005 with a model fit to data between January 2001 and December 2004. The last predicted month was December 2020 with all data between January 2001 and November 2020. To reduce complexity, the models did not use weather forecasts, but used the monitored index' values to predict occurrences. Hence, T month focused solely on the impact prediction and not on the hazard component.
We applied S other , S within and T month in each subregion, and therefore implemented an evaluation approach for 48 model predictions (2 model types, 3 experiments, 4 subregions, 2 impact types, figure 2): first, we calculated the Area Under the receiver operating characteristic Curve (AUC) [46]. The AUC has been widely used for probabilistic forecasts, as it measures the prediction accuracy based on the truepositive rate (sensitivity) and the false positive rate (specificity). The AUC ranges from [0,1] where AUC = 1 is a perfect score and AUC ⩽ 0.5 indicates no skill. Second, we evaluated the predictions of S within , S other and T month over time (N t ) and space (N s ) by the sum of predicted occurrence probabilities versus the sum of reported occurrences.
where t represented a date between January 2001 and December 2020, s represented a NUTS region in the considered subregion. I represented the impact occurrence either binary, when N t and N s were calculated for reported occurrences (I ∈ {0, 1}), or as probability when N t and N s were calculated for predicted occurrences (I ∈ [0, 1]). The square of Pearson's r (r 2 ) between observed and predicted N t and N s helped to identify, if the applied models predicted better at the temporal or spatial scale.

Predictive model performance
Drought predictions were more successful within a region (S within ) than to other regions (S other , figure 3). D SM impacts were more difficult to predict than those of D H , most pronounced for the predictions in space. Since fewer impacts were recorded for winter, also the model predictions in winter were of overall poorer quality ( figure 3(b)). For the other seasons better performance could be shown with, on average, best performance for autumn (median AUC = 0.87) followed by summer and spring. LR outperformed RF in most cases and most clearly when predicting to other regions (median AUC = 0.85 and 0.78, figure 3(a)). RF performed slightly better within a region, with less difference between D SM and D H impacts and with the best overall prediction of D H impacts in the Northern region (AUC = 0.93).
LR showed better skill than RF for the re-forecast T month , in particular shown by the better results for D SM impacts (median AUC = 0.87 and 0.83, figure 3(a)). Contrasting the model experiments in space, T month predicted the winter season substantially better. Moreover, T month did not result in consistent differences between D SM and D H impacts. Even though the training data increased, the performance of T month showed no improvement over the timespan, neither for any subregion, nor for any model type (supplementary data, figure S1).

Aggregated impact predictions in space and time
According to the aggregated predictions versus observations (figure 4), LR outperformed RF in most cases by higher r 2 . Both models types showed consistently higher r 2 for predictions within a region than to other regions, and better results for D H impacts than for D SM impacts, confirming previous results.
Regarding the model experiments in space, higher r 2 for N t than for N s indicated better prediction for specific time periods than for specific NUTS regions. For example, r 2 was lowest for the prediction in specific NUTS regions, while it was highest for the monthly aggregations (compare N s and N t of S other and S within ).
Regarding the re-forecast T month , LR performed generally better than RF. In particular, RF showed little skill to predict impact probabilities to the corresponding months the EDII ALPS reports them (r 2 of N t ⩽ 0.25).

Discussion
Our prediction experiments showed that hydrological drought (D H ) impacts can be predicted better than soil-moisture drought (D SM ) impacts during all prediction experiments. Within a region, predictions were generally good, and random forest (RF) was slightly superior to logistic regression (LR), while across regions, LR fared substantially better. LR showed also slightly better skill than RF during the reforecast experiment.

Prediction within and to other regions
As the drought characteristics of the prediction data deviated from the training data, RF's fundamental inability to extrapolate beyond the range of training data became problematic and led to a loss of prediction accuracy, when predicting regions characterised by very different climatic (from the Northern to the Southern region and vice versa) and altitudinal conditions (from the pre-Alpine to the highaltitude region and vice versa). In those cases, LR performed substantially better. For the prediction within a region, the training data appeared to cover a sufficient range that enabled better performance by RF. In conclusion, for regions more similar and with enough training data we suggest to use the RF algorithm, but when predicting to conditions outside the training data environment, we suggest to better use regression approaches.
The consistent better predictions of D H impacts compared to D SM impacts probably have resulted from D H impacts' larger sample size (n = 541) compared to D SM impacts (n = 336). In addition, D H exhibits a stronger occurrence seasonality [18], allowing the model to identify patterns of D H impacts and hence yielding better predictions.
The seasonal variability of predictive performance (figure 3) supports that the accuracy is affected by the impacts' sample size. Impacts were best predicted in autumn, followed by summer, spring and winter corresponding to the amount of reported impacts, highest in summer (n = 1954), followed by autumn (n = 604), spring (n = 476) and winter (n = 136). During winter, worst performance was presented, in particular with no skill for regions, where the EDII ALPS reported almost no impacts. One reason might be that during winter the value range of the drought indices used as predictors differed from the value range of the training data. In particular, when predicting from the high-altitude region to the pre-Alpine region (or vice versa) the winter season differed most, and therefore, the winter predictions might have lacked performance. Another reason might be the limited awareness of people to drought in winter, which might have led to few reports on winter impacts, despite their occurrence. [50,51] highlight the relevance of winter for droughts, as accumulative drought effects from summer and autumn can result in winter impacts. Typical examples from the EDII ALPS are lack of water for artificial snow production and thus less visits in ski resorts. However, the number indicates little relevance in the past [18]. With climate change and raised temperatures winter impacts might increase in future. Therefore, a better understanding of the direct and prolonged consequences are essential to improve the presented drought impact predictions.
The best prediction serves as potential application case and is therefore illustrated in more detail ( figure 5). For this prediction of D H impacts in the Northern region, most occurrence data (n = 354) was available. The predictions matched the reported peaks of impact occurrences during the years 2003, 2011, 2015 and 2018, although less clear for the latter two events ( figure 5(a)). The reasons for underestimation might be a more active reporting during the later peaks due to digitization and higher interest over the last decade, a trend that has not been captured by the model fit. Contrasting the peak periods, the model predicted few impact probabilities during time periods without any observed occurrences, e.g. in the June, 2006. Furthermore, the model located impact probabilities in all NUTS regions, while the EDII ALPS did not report any D H impact for 29 out of 122 NUTS regions across the Northern region ( figure 5(b)). Thus, the prediction distributed impacts more homogeneously. The overestimation during certain periods and for specific regions might be an artefact of gaps in the EDII ALPS in accordance with [18]. They presented several reasons for potentially missed impacts. Furthermore, the example shows clearly the effect of the predictor for reporting behaviour (c weight ) accounting for country-specific differences, since the predicted impact probabilities rank as the reported impacts: highest in France followed by Switzerland and then by Austria and Germany. In addition to varying reporting behaviour, we assume nationalspecific vulnerability could as well have driven the differences between the countries. Including the predictor c weight appears essential for the predictions correcting the sampling bias in the EDII ALPS , but it is simplified and thus could be further addressed. The predicted peaks estimate where and when further impacts may have occurred and could thus serve as a tool to identify gaps in the database.

Re-forecasting skill
LR performed better in the re-forecast T month than RF, and thus predicted better impacts of the next month ( figure 3(a)), confirming the better skill to extrapolate. Furthermore, the better performance of LR than of RF to locate impact probabilities to specific time periods, in particular during winter was shown by the evaluation with r 2 and the AUC (figures 3(b) and 4). One reason might be the higher share of training data relative to the prediction data, which covered only a single month. In contrast, the share of prediction data of S other was substantially higher, since S other predicted the whole time period for all NUTS regions in the other subregion. Hence, more training data probably has led to better predictions.
However, the AUC evaluating T month on a yearly basis did not indicate any trend of improved performance, despite the increasing training data set (supplementary data, figure S1). Since the AUC changed from year to year, the data used to fit the models might be below a certain density of records needed to identify all patterns, even when data of the whole time periods would be used. For selected single regions with good data coverage [23] presented a reforecasting skill up to seven months ahead highlighting the potential of models using reports as proxy data also for time periods further in future. To overcome data scarcity, text-mining approaches may help and could be implemented almost in near-real time [49]. But the developed text-mining approaches still have to cope with correct timing and classification of drought impacts and are also affected by reporting behaviour.
The re-forecast was implemented with 1 month lead time showing good performance throughout the year and presenting the potential of reported impacts as proxy data also in regions of spatially larger extent. For more robust results, we suggest to compare the estimates based on reported impacts with other quantitative data, in particular when the model is developed for specific impact types. For example, when predicting agricultural impacts, the results could be compared with data for yield losses, similarly to [52]. For an operational implementation, the model has to predict the hazard component as well, as with numerical weather prediction models. The coupling of these two models adds uncertainty, but is required to complement early warning systems [2].

Limits and future directions
The models used predictors that have been identified by other studies as suitable for quantifying drought impacts [22,31,47,48,53]. Even though some of the selected drought indices are correlated, this study applied a combination of them for two reasons. (1) We assume a similar level of correlation in the prediction data meaning the predictive performance is not affected by correlation of the training data [54], and (2) we aim to gather the best prediction skill, and not to identify the best exploratory variables. The predictions might be further improved by integrating hydrological data, such as discharge or snow measurements, especially regarding the winter season and the higher-elevation regions. Data of snow and glacier melt may help to account for water storages that are released with warmer temperatures. Integrating such processes may better identify the delayed occurrence of drought impacts, in particular when distinguishing from upstream to downstream [55]. In addition, process-based components, capturing interactions and feedback loops, could improve the performance. Most impact-based predictions of natural hazards are simplified due to the limited experience compared to forecast the hazards, such as floods and storms [2]. In the drought context, the limited impact data also lead to simple model structures that may miss more complex processes.
The models used text reports from the EDII ALPS as a proxy to predict impact occurrences, an approach applied by several studies due to the lack of alternative quantitative data [13,22,23,56] . Even though these studies highlighted the potential to provide useful information for drought risk management, the conversion of text-data into quantitative data has limitations. There might have been impacts not recognised and/or not described and published and thus missed. On the other hand, an increasing number of reported impacts might be affected by higher awareness rather than the level of impact severity [13,17]. Thus, it is difficult to disentangle reporting behaviour from true impacts and their severity level. To reduce such bias, the predictions were made with a binary response and by the predictor accounting for country-specific reporting behaviour, but could be addressed further.
Both impact types integrate a broad range of impacts. D SM impacts cover consequences, such as reduced productivity of crop cultivation, yield losses, or reduced tree vitality, and D H impacts availability of irrigation water, water supply, water quality or mortality of aquatic species. Each of the assigned impact types have their specific causes from the droughthazard component, but as well from exposure and vulnerability characteristics. With better data availability the implemented method could be tested on more specific impacts with impact-type dependant predictors.
This study implemented LR and RF, since both have been previously used for modelling drought impacts based on text reports [22,23,[47][48][49]. Despite very different optimisation processes, the results supported that both model types are generally suitable to take drought impact modelling one step further towards prediction. However, the more modern RF approach was shown to be inferior to LR, when predictions were beyond the training data environment. When using the report data as count response, model types such as Poisson regression or negative binomial regression could be suitable alternatives.

Conclusion
The drought impact predictions presented in this study reveal particular limits and potentials. With the help of reported impacts and the most common monitoring indices, the implemented models are the first to integrate spatial components and testing the prediction of impact occurrence in regions outside the training data area. The results highlight the potential to identify and fill gaps in data-sparse regions, though with considerably more confidence for more similar regions. For model applications to different conditions our results suggest that logistic regression obtains better skill to extrapolate compared to random forest. Winter impacts are not predicted well, but occur rarely, and might be improved by targeted data including snow. Both model types presented good re-forecasting skill with one month lead time. However, in this setup the increasing training data did not improve the prediction performance. Further work might test improvements from a larger range of drought events and additional impact data, more targeted or additional predictors and different model structures. Comparisons with complementary data sources and processes-based approaches are needed to better understand causal relations between drivers and impacts and should be used in combination for drought early-warning.

Data availability statement
The data cannot be made publicly available upon publication because they are owned by a third party and the terms of use prevent public distribution. The data that support the findings of this study are available upon reasonable request from the authors.