Risks for the global freshwater system at 1.5 °C and 2 °C global warming

To support implementation of the Paris Agreement, the new HAPPI ensemble of 20 bias-corrected simulations of four climate models was used to drive two global hydrological models, WaterGAP and LPJmL, for assessing freshwater-related hazards and risks in worlds approximately 1.5 ◦ C and 2 ◦ C warmer than pre-industrial. Quasi-stationary HAPPI simulations are better suited than transient CMIP-like simulations for assessing hazards at the two targeted long-term global warming (GW) levels. We analyzed seven hydrological hazard indicators that characterize freshwater-related hazards for humans, freshwater biota and vegetation. Using a strict deﬁnition for signiﬁcant differences, we identiﬁed for all but one indicator that areas with either signiﬁcantly wetter or drier conditions (calculated as percent changes from 2006–2015) are smaller in the 1.5 ◦ C world. For example, 7 day high ﬂow is projected to increase signiﬁcantly on 11% and 21% of the global land area at 1.5 ◦ C and 2 ◦ C, respectively. However, differences between hydrological hazards at the two GW levels are signiﬁcant on less than 12% of the area. GW affects a larger area and more people by increases—rather than by decreases—of mean annual and 1-in-10 dry year streamﬂow, 7 day high ﬂow, and groundwater recharge. The opposite is true for 7 day low ﬂow, maximum snow storage, and soil moisture in the driest month of the growing period. Mean annual streamﬂow shows the lowest projected percent changes of all indicators. Among country groups, low income countries and lower middle income countries are most affected by decreased low ﬂows and increased high ﬂows, respectively, while high income countries are least affected by such changes. The incremental impact between 1.5 ◦ C and 2 ◦ C on high ﬂows would be felt most by low income and lower middle income countries, the effect on soil moisture and low ﬂows most by high income countries.


Introduction
It is well-established that alterations of the freshwater systems increase with increasing levels of global warming (GW) and that potential negative impacts of climate change outweigh positive ones (Döll et al 2015, Jiménez-Cisneros et al 2014. In recent years, state-of-the-art multi-model studies in which a number of global hydrological models (GHMs) were driven by bias-corrected output of selected climate models supported this knowledge. Translation of the transient scenarios until the year 2100 to GW levels was achieved by selecting, for each GCM and emissions scenario, the respective future time periods that correspond to a certain level of GW (Schewe et al 2014). However, to support implementation of the Paris Agreement (UNFCCC 2015), which considers long-term target levels of GW, it is preferable to quantify climate change risks for steady-state conditions at 1.5 • C and 2 • C GW above pre-industrial levels, and not under transient conditions where both GW levels are only temporarily reached and then afterwards exceeded (Mitchell et al 2016). Time sampling from small ensemble CMIP5 simulations neglects multi-decadal natural variability and the impact of aerosol scenarios, it assumes independence of the impacts from the global mean surface-air temperature pathway (James et al 2017) and does not allow for a comprehensive analysis of extreme weather events (Schleussner et al 2016a). Although this assumption is valid for temperature and precipitation extremes (Pendergrass et al 2015, Seneviratne et al 2016, it does not hold for changes in groundwater recharge or snowpack, evapotranspiration and annual streamflow (Donnelly et al 2017, Portmann et al 2013. These shortcomings of the existing transient climate scenario approach for scientifically supporting the Paris Agreement gave rise to the 'Half a degree Additional warming, Projections, Prognosis and Impacts' (HAPPI) experiment (Mitchell et al 2017). A main goal of the HAPPI experiment is to provide climate scenarios that are better suited to (1) describe how the climate might differ from today in worlds that are 1.5 • C or 2 • C warmer than under pre-industrial conditions and (2) distinguish climate change risks in a 1.5 • C world from those in a 2 • C world, in particular for risks related to extreme events. To address the large internal variability of the climate system, i.e. fluctuations that arise in the absence of anthropogenic forcing, each of four involved GCMs produced an ensemble of 20 independent model runs for each of three quasi-stationary climate conditions: (1) a decade with the GW level of the historical period 2006-2015, (2) a decade that is about 1.5 • C warmer than pre-industrial (1861-1880) conditions and (3) a decade that is about 2.0 • C warmer than pre-industrial (1861-1880) conditions.
Global-scale assessments of the risks of climate change on freshwater systems have often been restricted to long-term average annual runoff or streamflow (e.g. Davie et al 2013, Schewe et al 2014, Schleussner et al 2016b. However, humans and other living beings may be affected more by other changes in water flows and storages. If, for example, the source of human water supply is groundwater, alteration of mean annual groundwater recharge is of interest (Döll 2009), while alteration of low flows is relevant in case of water supply from rivers, but also for characterizing the risk for freshwater biota (Döll and Zhang 2010). Global patterns of wetting and drying due to climate change were found to be consistent between mean annual runoff and statistical low flows, but the area where low flows were halved was computed to be almost twice as large as for mean annual runoff (Döll and Müller Schmied 2012). Therefore, a comprehensive assessment of freshwater-related climate change risks (according to IPCC 2014) should encompass quantification of various indicators of freshwater-related hazards. Indicator selection should be based on an analysis of who may suffer from a change in specific water flows and storages in various compartments (like groundwater or soil), taking into account magnitude, timing and frequency.
The objective of this paper is to assess freshwaterrelated hazards and risks of climate change at the global scale in worlds that are 1.5 • C and 2 • C warmer than pre-industrial. In particular, we determine where different types of hydrological hazards differ significantly between both warming levels. In our study, the biascorrected HAPPI output of four GCMs served to drive two state-of-the-art GHMs at a spatial resolution of 0.5 • geographical latitude by 0.5 • geographical longitude. The daily output from the resulting eight GCM-GHM combinations was used to quantify seven hydrological indicators (HI), with the percent change between the HI value for the historical decade and the value under each GW level being defined as the physical hazard caused by anthropogenic climate change. The challenge was to make best use of the newly available HAPPI climate ensemble consisting of 20 independent 10 year time series per GCM for each of the three evaluation periods to quantify significant hazards at 1.5 • C and 2 • C GW and to determine where the hazards differ significantly between the two warming levels. Freshwater-related risks to humans are characterized by relating the hazard to exposure using a population scenario, and by taking into account vulnerability and coping capacity by analyzing the results for four groups of countries that differ in terms of current per capita gross national income.

Climate forcing, simulations, data and output analysis
HAPPI experiments are atmosphere only with prescribed atmospheric forcing, sea-surface temperature and sea-ice coverage (Mitchell et al 2017). Forcing for the recent decade 2006-2015 in HAPPI simulations follows observations during this time period, including increasing CO 2 concentrations (381-403 ppm). Radiative and sea-surface temperature forcing for the 1.5 • C experiment were taken from RCP2.6, whereas forcing for the 2 • C was derived from RCP2.6 and RCP4.5. Seasurface temperature fields were calculated by adding multi-ensemble end of century mean CMIP5 anomalies to the observed 2006-2015 values. In this study, the output of four GCMs following the HAPPI protocol was used: CAM4 (Neale et al 2013), MIROC5 (Shiogama et al 2014), MPI-ECHAM6.3 (Stevens et al 2013) and NorESM1 Happi (Bentsen et al 2013, Iversen et al 2013. Each of the 20 model runs of a GCM performed for each of the three 'decades' differs from the others in its initial weather state. In order to enable utilization of these simulations by GHMs, a trend-preserving bias correction method was applied (Hempel et al 2013). Following the ISIMIP modelling protocol (Frieler et al 2016), the output of the HAPPI were first re-gridded to a 0.5 • × 0.5 • regular grid and then bias corrected using the EWEMBI (EartH2Observe, WFDEI and ERA-Interim data Merged and Bias-corrected for ISIMIP) dataset (Lange 2017). This was applied to the 20 member ensemble of each of four HAPPI GCMs and has been shown to substantially improve the ensemble mean performance while preserving the ensemble variability.
To compute hydrological variables such as streamflow or groundwater recharge, each HAPPI climate forcing was used to drive two GHMs, WaterGAP and LPJmL (appendix A1). Based on the hydrological variables, eight HIs were computed (section 2.2). Then, hydrological hazard indicators (HHIs) for the two GW levels 1.5 • C and 2 • C were calculated (section 2.2). To classify HHIs or differences of the HHI for 1.5 • C and 2 • C GW as significant, we developed a strict definition described in appendix A2. Finally, freshwater-related risks were evaluated by taking into account population data and classification of countries by income (appendix A3).

Hydrological indicators and hydrological hazard indicators
The seven HIs analyzed in this study include four indicators describing the streamflow regime, mean annual diffuse groundwater recharge (i.e. renewable groundwater resources), soil moisture in the driest month during the historical growing period (critical soil moisture) and snow storage in the calendar month with highest storage during the historical decade (table 1). For critical soil moisture, the evaluation season is limited to the growing period at the beginning of the 21st century while the month with the lowest soil moisture may vary between historical and future climate. While soil moisture in the spatially varying rooting zone is used to calculate S soil min in case of WaterGAP, it is soil moisture in the uppermost meter in case of LPJmL. Future snow storage is evaluated for the calendar month with the highest snow storage during the historical decade. With GW, peak snow storage can be expected to move to earlier spring dates, and it is of interest to know how much snow remains stored during the historical calendar month as this is available for streamflow augmentation afterwards (Harpold et al 2017 where HI GWL,MC,R is the HI computed for the GW level, model combination and run and HI hist,MC is the quasi-stationary value of the HI for the historical decade averaged for each model combination over all 20 ensemble runs. For determining whether the hydrological hazards in the 2 • C world are significantly different from the hazards in the 1.5 • C world, the absolute values of the HHIs were evaluated, i.e. the magnitude of the hazards at both GW levels. Absolute values were used because hazard can be assumed to increase with the deviation of HI from historic conditions, no matter if changes are positive or negative, for example when considering freshwater habitat conditions. As HHIs are computed as relative HI changes, the occurrence of very low HI hist,MC values may lead to very high HHI values that cause a strong distortion of average values (either averages over model combinations or over grid cells). Therefore, grid cells for which at least one of the eight model combinations did not exceed the HI-specific thresholds listed in table 1 were not considered in any further analysis.

Spatial distribution of three selected hydrological hazards
Hydrological hazards in the 1.5 • C and 2 • C worlds regarding mean annual streamflow Q m are shown in figure 1. In the 1.5 • C world, HHI 1.5 • C (Q m ), computed as the mean of the relative changes of Q m between historical and future quasi-stationary conditions over the eight model combinations, is projected to be mostly below 10%. Still, many of these small changes are found to be significant, indicated by full colors (figure 1(a)). In the 2 • C world, the projected increases of Q m become higher in the Arctic, India and Southeast Asia (figure 1(b)). It is noteworthy that the insignificant changes of Q m around the Mediterranean in the 1.5 • C world become significant decreases of 10%-30% in the 2 • C world. Significant decreases are found on 13% of the global land area excluding Greenland and Antarctica (GLA) for both GW levels, while significant increases are projected to occur on 21% for the 1.5 • C GW level but on 26% for the 2 • C level. Visualizing the ratio of the absolute values of the HHIs at the two GW levels, figure 1(c) shows where the magnitude of Q m change is higher for the 2 • C world (indicated by orange and red colors) and where for the 1.5 • C world (indicated by green and blue colors). As expected, the hazard is dominantly higher for the 2 • C world. According to our strict definition of significance (appendix A3) only 11% of the GLA show a significantly larger hazard in the 2 • C world than in the 1.5 • C world (as indicated by full color orange and red). These areas are predominantly found at very high latitudes and in Southern Europe, India and the Congo basin. On 0.5% of the GLA, the hazard is significantly higher for the 1.5 • C world (indicated by full color green and blue). Table 1. Hydrological indicators (HI) for assessing freshwater-related climate change hazards. All indicators are computed from 10 years of daily values and describe the average behavior over 10 years. All indicators are computed for 0.5 • grid cells. Grid cells for which at least one of the eight model combinations does not exceed the HI-specific threshold value (third column) were not considered in the hazard analysis for this HI.

HI HI abbrev. Calculation (threshold value) Affected by hazard
Mean annual streamflow Q m Arithmetic average of 10 annual streamflow values (0.1 m 3 s −1 ) Number of fish species a , groundwater-dependent floodplain vegetation 1-in-10 dry year streamflow The lowest annual streamflow value of the 10 year run (0.1 m 3 s −1 ) Human water supply from river water, habitat for freshwater biota, wastewater dilution 7 day low flow Q 7lf Lowest value of all rolling means of daily streamflow during every consecutive seven days period in each year, with label in center b (0.1 m 3 s −1 ) Same as Q 1−in−10dy plus inland water transport 7 day high flow Q 7hf Hydrological hazards regarding extreme low flows, HHI(Q 7lf ), could only be computed for about half of the GLA-mainly along major rivers-because the threshold value for historical conditions was not reached for the other grid cells (tables 1 and B1). A comparison of HHI 1.5 • C (Q 7lf ) (figure 2(a)) with HHI 1.5 • C (Q m ) (figure 1(a)) clearly shows that, not unexpectedly, the relative changes for the low flows are very often higher than for the annual means. In the tropical Amazon, Congo and Indonesian basins Q 7lf decreases by >10%, while Q m decreases less, and in the southwestern part of Russia, Q 7lf increases by >30% but Q m by <10%. Compared to the 1.5 • C world, the projected increases of Q 7lf become higher in the 2 • C world in higher northern latitudes and in eastern Africa, India and Southeast Asia (figure 2(b)). Projected decreases of Q 7lf intensify in the Amazon basin, Western USA, central Canada, and in Southern and Western Europe, but not in the Congo basin or Indonesia, where models agree less on the sign of change under 2 • C GW than under 1.5 • C GW. Significant decreases (increases) are found on 13% (10%) of the GLA for the 2 • C GW level but only on 10% (9%) for the 1.5 • C GW level. Taking into account the land area fraction for which the respective HHIs were not computed (last column of table B1), significant decreases of Q 7lf are projected on 27% of GLA with computed HHI but only on 13% in case of Q m (figure B1). The respective percentages for significant increases are more similar, 22% for Q 7lf and 27% for Q m . The hydrological hazard regarding Q 7lf is dominantly higher for the 2 • C world, with 7% of the GLA showing a significantly larger hazard than in the 1.5 • C world, while the reverse is true for only 0.3%. Again, significant differences are concentrated at high latitudes (figure 2(c)).
The hazard pattern related to extreme high flows HHI(Q 7hf ) differs strongly from the low flows pattern (compare figures 2(a) and (b) with figures 2(d) and (e)). There are more grid cells with significantly increasing high flows than with significantly decreasing high flows, while the opposite is true for lows flows. This reflects the well-established increase of climate variability as caused by climate change. Significantly increased high flows occur in South and Southeast Asia and Central Africa. With an additional half a degree warming, high flows intensify there, and parts of South America also get significant increases as compared to today. However, while low flows increase in northern latitudes (figures 2(a) and (b)), high flows decrease in many of these grid cells due to decreased snow melt volumes, e.g. in Scandinavia and Eastern Europe. Significant decreases (increases) of Q 7hf are found on 10% (20%) of the GLA for the 2 • C GW level but only on 7% (11%) for the 1.5 • C GW level. The hydrological hazard regarding Q 7hf is significantly larger in the 2 • C world than in the 1.5 • C world on 6% of the global land area, while the reverse is true for only 0.1%. The former areas are scattered all around the world (figure 2(f)).
. Full colors show ensemble means of eight model combinations in grid cells where at least six out of eight model combinations agree on the direction of change while at least five out of eight model combination result in significant differences between historical and future HIs according to the KS test. Weaker colors indicate all grid cells with the same agreement but significant differences only for 2-4 model combinations. In grey areas, less than two model combinations show significant differences and less than six model combinations agree on the sign of change (a) and (b) or on the GW level that leads to a larger magnitude of the hazard (c), and is also used if condition 3 (see section 2.3) is not fulfilled. White areas indicate grid cells for which HHI could not be computed.

Aggregated results for all grid-cell specific HIs
3.2.1. Significant hazards and significant differences between hazards in the 1.5 • C and 2 • C worlds at the global scale and for country groups While, in the 2 • C world, land area with a significant increase of Q m and Q 7hf is globally twice as high as the area with a significant decrease, the opposite is true for soil water storage in the driest month of the growing period S soil min and maximum snow storage S snow max (figure 3(a) top). The area with significantly increasing annual low flow Q 1−in−10dy is 70% larger than the area with significantly decreasing values, but for Q 7lf , the area with significant increases is 20% smaller than the area with significant decreases (figure 3(a) top). Regarding groundwater recharge GWR, differences between land fractions with increasing or decreasing values and between the two GW levels are small, but 55% of GLA had to be excluded from the computation because LPJmL computes zero GWR in a large number of grid cells. For all indicators except GWR, the area with either significantly wetter or drier conditions is smaller in the 1.5 • C world. The largest difference between the two GW levels is found for the area with a significant high flow increase, which almost doubles with half a degree of additional warming from 11%-21% (figure 3(a) top).
For all indicators, the land area where the magnitude of the hazard (absolute value of HHI) is significantly larger in the 2 • C world than in the 1.5 • C world was found to be much larger than the area where the opposite is true (compare solid pink to almost invisible green line in figure 3(b) top), ranging from 4% for Q 1−in−10dy to 11% for Q m and S snow max . However, depending on the indicator, HHI cannot be reliably computed on parts of the GLA as HI for the historical decade does not exceed a threshold (table B1). Considering only the area of the grid cells for which the individual HHIs can be reliably computed, 33% and 15% of the area show significantly larger hazards in the 2 • C world than in the 1.5 • C world in case of S snow max and Q 7lf , respectively ( figure B1(b)). For all indicators except Q 1−in−10dy and Q 7hf , the hazard is at least 50% stronger at 2 • C GW than at 1.5 • C on more than half of the area with significant differences.
How do the projected hydrological hazards differ among country groups of similar wealth? The occurrence of hazards on percentages of the land area of countries belonging to one of the four World Bank income groups is shown in the bottom part of figure 3. Low income countries account for 10% of the GLA and 8% and 10% of the global population in 2010 and 2100, respectively, while lower middle income countries account for 16% of the GLA and 40% and 45% of the global population in 2010 and 2100. What is surprising is that for both income groups, the percentages of land areas that become significantly Figure 3. Percentage of land area on which the HIs become significantly wetter (higher) or significantly drier (lower) in either the 1.5 • C or the 2 • C world as compared to quasi-stationary conditions at the beginning of the 21st century (a) and the percentage of land area on which the magnitude of the hazard related to a HI, i.e.|HHI|, is (significantly) larger in the 2 • C world than in the 1.5 • C world or vice versa (b). All results refer to the mean of the eight model combinations. 'Land area' refers to global land area except Greenland and Antarctica (Global) as well as to land area in low income countries (L), lower middle income countries (LM), upper middle income countries (UM) and high income countries (H) as defined by World Bank (2017). drier than in the quasi-stationary historical decade are, for most HIs, higher in case of 1.5 • C GW than in case of 2 • C GW, related mainly to projections for the Sahel and Indonesia, where drier conditions under 1.5 • C turn to wetter conditions under 2 • C (figures 1 and 2). The opposite is true for the global scale and for high income countries. As the low income and lower middle income country groups encompass many countries with high seasonal variability, HHI(Q 7lf ) could only be computed reliably for 39% (43%) of the land area of low (lower middle) income countries, as compared to 47% of GLA. In low and lower middle income countries, the area with a significant decreases in Q 7lf is projected to be more than three times larger than the area with significant increases. A significant decrease of Q 7lf was computed for the 2 • C world on 32% (25%) of the land area of low (lower middle) income countries for which HHI(Q 7lf ) could be computed, while this was the case on 27% globally ( figure B1(a)).
Upper middle income countries, including China, Russia and Brazil, account for 45% of GLA and 36% and 23% of the global population in 2010 and 2100, respectively. The hazard situation in this country group is quite similar to the global scale ( figure 3). However, the situation in the high income country group, which includes most European countries but also Chile and Uruguay and accounts for 29% of GLA and 17% and 15% of the global population in 2010 and 2100, differs strongly from that in the three other country groups. In this group, the difference between hazards at the two GW levels is more distinct, and it is the Table 2. Percent of global population in 2100 affected by changes of seven hydrological indicators HI in case of either the 1.5 • C or the 2 • C world as compared to quasi-stationary historical conditions at the beginning of the 21st century. Arithmetic means of the eight model combinations (averaged over 20 runs) are listed. The last column shows the percentage for which the hydrological hazard indicator HHI was not computed because the threshold for HI in the historical decade (table 1)  only country group where the area with Q 7lf increases is larger than the area with decreases, in both GW scenarios ( figure 3(a)). Regarding snow storage, the high income countries have the highest percentage of land area affected by a significant decrease. However, only 60% of cells with snow cover during the historical decade in this country group but 84% of those cells in the low income country group are projected to suffer from a significant decrease of snow storage in a 2 • C world ( figure B1(a)). For all four country groups and HHIs, the percentage of grid cells in which the magnitude of the hazard in the 1.5 • C world is significantly larger than in the 2 • C world is very small ( figure 3(b)).

Population and land area affected by certain levels of hydrological hazards
As projected HI changes were found to be significant on less than half of the GLA, it is informative to analyze also non-significant changes by showing which GLA fractions are exposed to certain hazard levels (table B1). Table B1 also indicates uncertainty of projections by listing the ranges of the eight model combinations. In addition, exposure of human population to those hazard levels was quantified (table 2). At 2 • C GW a higher percentage of the global population than at 1.5 • C GW would be affected by a relevant hydrologic hazard, here defined as an ensemble mean HI change of more than ±10%, considering all indicators except GWR (table 2). The same is true for GLA (table B1). Averaged over all model combinations, 10% of the global population would be spared a relevant change in mean annual streamflow if GW were constrained to 1.5 • C instead of to 2 • C, as 85% and 75% of the global population in 2100 would not suffer from a relevant hydrological hazard related to Q m in the 1.5 • C and 2 • C world, respectively (table 2). With a HHI(Q m ) range of about 30%, the variation among model combinations is large (table B1). When comparing HHIs of the other three streamflow-related indicators for extreme annual or seasonal flows to those of Q m , the expected increased climate variability is manifest, population not subject to changes of more than 10% being smaller than for Q m (table 2). In addition, Q 1−in−10dy tends to decrease more strongly than Q m , while Q 7hf , which may lead to flooding, is computed to increase more strongly than Q m . In the 2 • C world, Q 7hf is projected to increase by more than 10% in grid cells where 28% of the global population live, as compared to only 17% in the 1.5 • C world. More people are projected to suffer from a relevant Q 7lf decrease than from a relevant increase, while the opposite is true for the other streamflow-related HIs (table 2). This is consistent with figure 3 that visualizes significant changes. With half a degree additional warming, the population subject to a relevant decrease of S soil min almost doubles from 14% to 27% (table 2). The majority of the 10% of the global population living in snow area may suffer from relevant decreases of snow storage at both GW levels. Half a degree additional warming raises the fraction of the people in snow areas that is affected by a decrease of more than 30% from one third to one half.
Comparing the global values of table 2 to the respective values computed for the four country groups (not shown), it becomes manifest that the population additionally subject to a relevant decrease in S soil min in case of half a degree additional warming is particularly high in high income countries. Only 4% of the population in this country group may be affected in the 1.5 • C world but 31% in the 2 • C world. In middle income countries, affected population doubles. The population subject to relevant decreases of Q m and Q 7lf more than doubles in the high income country group.
In this group, the fraction of the population in snow areas (31% of the total population) that is affected by a decrease of snow storage of more than 30% almost triples to 60%. The effect of half a degree of additional warming regarding extreme high flows is rather small in the high income and upper middle income country groups, but in the low income (lower middle) country group, a relevant increase of such high flows is projected to affect 9% (18%) of the population at 1.5 • C GW, but 24% (34%) at 2 • C GW.

Definition of hazard indicators
For indicating a hazard due to future climate change, one may select absolute differences (e.g. in mm/yr) and relative differences (e.g. in percent). Relative differences are generally preferred regarding water flows and storages as they take into account the strong spatial heterogeneity of the HIs. A certain absolute change in a HI is a much stronger hazard in a dry region with a low HI value under current conditions than in a wet region, which is reflected by a higher relative change in the dry region. The disadvantage of selecting percent changes as HHI is that they cannot be (reliably) computed if the HI under current conditions is zero or very small. The HHI snow storage does not reflect hazards for population downstream of the snow-covered regions. However, hazards due to the effect of changed snow storage on downstream streamflow are taken into account by the four streamflowrelated HHIs.
HIs may increase or decrease with climate change, and the sign of change, e.g. whether it becomes wetter or drier, may be important for assessing the hazard. However, whether a decrease or an increase is a hazard often depends on local conditions. For example, increased high flows may indicate a hazard for human settlements but also better habitat conditions for freshwater-dependent biota. In addition, any change from current conditions may be hazardous as humans and other biota have adapted to these conditions. To assess to what degree a hazard is larger for a 2 • C GW than for a 1.5 • C GW, we therefore compared the magnitudes of relative changes |HHI|.

Representativeness of simulations
We only analyzed eight model combinations (4 GCMs, 2 GHMs) in this study. A comparison of figure 1(a) and (b) to the global maps of ensemble mean runoff changes at 1.5 • C and 2 • C GW computed from 55 model combinations (5 GCMs, 11 GHMs) (Schleussner et al 2016b, their figure 7) shows a similar spatial pattern of decreases and increases, with flow increases in high northern latitudes and India and decreases in Southern Europe and the Amazon basin. Regarding the differential hazards at the two GW levels, Schleussner et al (2016b) also identified the Mediterranean region to be most significantly affected by an additional runoff reduction due to half a degree additional warming. In a multi-model study for Europe using five hydrological models, Donnelly et al (2017) found a considerable difference between the changes of mean runoff and low runoff at 1.5 • C and 2 • C GW. This supports the assumption that the small number of model combinations applied in this study is to a certain degree representative for a larger ensemble.

Source of discrepancies among model combination results
It is generally not known why GHMs translate climate change signals differently into changes of hydrological variables (Döll et al 2016). A major difference between LPJmL and WaterGAP is that only the former computes vegetation dynamics and thus the impact of changing climate and atmospheric CO 2 concentrations on the vegetation structure and its transpiration (Davie et al 2013, Gerten et al 2014. Moreover, only WaterGAP is calibrated against observed mean annual streamflow. When analyzing data like those presented in table B1 but separately for the two GHMs, one finds for five of the seven cell-based HHIs that the uncertainty due to the four GCMs is larger than that due to the two GHMs. For the 7 day low flow, the uncertainty sources have approximately the same relevance, while in case of groundwater recharge, the uncertainty caused by the GHM selection is dominant. The latter is due to the fact that LPJmL mostly computes much smaller (or even zero) GWR values for the historical decade than WaterGAP (the GWR of which has been validated, Döll and Fiedler 2008) and therefore computes much larger percent changes than Water-GAP. WaterGAP calculates larger areas with decreases or increases than LPJmL for both annual streamflow indicators. For high flows, WaterGAP results in larger increases, while there is not much difference for low flow changes. Projected changes of minimum soil moisture differ most strongly between the two GHMs, with WaterGAP projecting stronger decreases with increasing GW and LPJmL stronger increases. It is not known to what an extent this discrepancy is related to the lack of active vegetation in Water-GAP or to the different soil depths for which the two GHMs provided soil moisture values.

Conclusions
The well-known uncertainty of projections of climate change hazards due differences among both GCMs and GHMs makes a clear distinction of the hydrological hazards at 1.5 • C and 2 • C GW difficult. Nevertheless, based on the HAPPI simulations of quasi-stationary climate which come without the disadvantages of transient climate simulations used in earlier studies, and using a new strict definition for significant, we identified (1) areas where significant hydrological hazards as quantified by relative changes of seven different hydrological indicators HI may occur in 1.5 • C and 2 • C worlds and (2) areas where each hydrological hazard (independent of the sign of change) is significantly larger at 2 • C than at 1.5 • C GW (or vice versa). For all HIs except GWR, global land areas with either significantly wetter or drier conditions are smaller for the 1.5 • C world than for the 2 • C world (figures 3(a) and B1(a)). Area and population affected by a relevant HI change of more than ±10% are higher for a 2 • C GW than for a 1.5 • C GW (tables 2 and B1). However, on the majority of the GLA, model uncertainty makes it impossible to clearly distinguish the magnitude of hazards at the two GW levels. The hydrological hazard in the 2 • C world was computed to be significantly larger than in the 1.5 • C world on only 11% of the GLA in case of Q m and S snow max , decreasing to 4% in case of Q 1−in−10dy ( figure 3(b)). The opposite was found for much smaller areas. The 2 • C world leads to a significantly stronger hazard on one third of current snow areas regarding S snow max , and on 15% of the GLA with low flows of currently more than 0.1 m 3 s −1 ( figure B1(b)).
Our study agrees with many previous studies on the overall spatial pattern of wetting and drying under GW, with strong wetting in the higher northern latitudes and South and Southeast Asia, and strong drying around the Mediterranean but also in the Amazon basin and in Southern Africa (Döll et al 2015, Sedláček andKnutti 2014). Q m changes (decreases of 10%-30%) in the Mediterranean become significant if GW increases from 1.5 • C-2 • C. While more land area and population are projected to be affected by increases than by decreases in case of Q m , Q 1−in−10dy , Q 7hf and GWR, the opposite is true for Q 7lf , S snow max and S soil min . Consistent with increasing climate variability, extreme low flows decrease and extreme high flows increase with GW when averaged globally, but the opposite is true for Scandinavia and Eastern Europe. Among the HIs, Q m shows the lowest projected percentage changes (table 2). This should be taken into account when interpreting climate change hazards studies focusing on Q m or mean runoff.
Assessing freshwater-related risks by relating the hydrological hazards to population suggests that around 10% of the global population would be spared a relevant change in Q m and Q 7lf of more than 10% if GW were constrained to 1.5 • C instead of 2 • C (table 2). Low income countries will be affected most strongly by decreases of Q 7lf , the area with projected significant decreases being three times as large as the areas with projected increases (figure 3(a)). The high income country group is the only one for which the area and population fraction subject to significant projected increases of Q 7lf is larger than the fraction with decreases. At the same time, it may be exposed least among the four groups to increased Q 7hf , with the lower middle income group being most affected. Regarding S snow max , the high income countries have the highest percentage of land area affected by a significant decrease, but when related to snow-covered land area in the historical decade, a larger percentage is affected in low income countries ( figure B1(a)). Regarding the differential risks at the two GW levels, the high income country group could particularly benefit from keeping GW at 1.5 • C. There, half a degree additional warming would increase the population affected by relevant decreases of S soil min , Q m and Q 7lf more strongly than in other country groups. The effect of half a degree of additional warming on Q 7hf , however, would be felt most strongly in the two poorer country groups where the percentage of population suffering from relevant increase of extreme high flow would approximately double from 1.5 • C GW to 2 • C GW.

Acknowledgments
This work was partially funded by the German Federal Ministry of Education and Research through grants 01LS1613B and 01LS1613A, and by the German Federal Ministry for the Environment, Nature Conservation and Nuclear Safety through grant 16_II_148_Global_A_IMPACT. It used science gateway resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the US Department of Energy under Contract No. DE-AC02-05CH11231. We thank two anonymous reviewers for their helpful comments on our manuscript. Appendix A. Global hydrological models, exposure and vulnerability data and determination of significance A1. Global hydrological models Both applied GHMs, WaterGAP and LPJmL, cover all land areas of the globe except Antarctica with a spatial resolution of 0.5 • . They share the same land mask and compute water flows and storages with a temporal resolution of one day, taking into account the impact of human water use and man-made reservoirs. For isolating the effect of GW, irrigation water use was computed as a function of the changed climate but growing periods were kept constant as was water use in the other sectors. Different from WaterGAP, LPJmL simulates the active response of vegetation to changing climate and atmospheric CO 2 concentrations, which affects evapotranspiration. Following the HAPPI protocol (Mitchell et al 2017), we applied for the period 2006-2015 transient historical CO 2 concentrations of 380.93-399.41 ppm, and constant values of 423.4 ppm and 486.6 ppm for the 1.5 • C and 2 • C experiments, respectively. Both GHMs were driven by the bias-corrected output of the four GCMs (daily precipitation, temperature, incoming short-wave and incoming long-wave radiation), resulting in eight model combinations. The outputs of all model combinations are considered to be equally likely.
WaterGAP (Müller Schmied et al 2014) consists of five sectoral water use models, the linking model GWSWUSE that computes net abstractions from groundwater and surface water, and the Water-GAP Global Hydrology Model (WGHM). Human water withdrawals and consumptive water use in the sectors households, manufacturing, cooling of thermal power plants, livestock and irrigation are computed by the water use models. Then, net water abstractions from groundwater and surface water are distinguished. The net abstractions become input of WGHM, together with time series of climate variables. WGHM computes various water flows (e.g. evapotranspiration, groundwater recharge and streamflow) as well as water storages in ten compartments, including snow, soil, groundwater and the surface water bodies lakes, manmade reservoirs and wetlands. It simulates the dynamic extent of surface water bodies that affects evapotranspiration and groundwater recharge from surface water bodies. Groundwater recharge includes diffuse groundwater recharge that is modeled as a fraction of total runoff depending on relief, soil texture, hydrogeology and the existence of permafrost and glaciers (Döll and Fiedler 2008) as well as groundwater recharge from surface water bodies in semi-arid and arid regions. WGHM is calibrated against observed mean annual streamflow at 1319 streamflow gauging stations by adjusting one to three parameters in the upstream area of the station. This calibration allows a reasonable quantification of renewable water resources but for many rivers does not lead to an appropriate simulation of streamflow seasonality and other streamflow regime characteristics (Hunger and Döll 2008). If the observation-based EWEMBI climate data set that was also applied for bias-correcting the HAPPI GCM output (section 2.1) is used to drive WaterGAP, monthly Nash-Sutcliffe efficiencies for streamflow at the 1319 gauging stations are larger than 0.7 for 372 stations, between 0.5 and 0.7 for 349 station and smaller than 0.5 for 598 stations. Still, comparison of observed streamflow regime indicators (different streamflow percentiles representing statistical low and high flows) to the values computed by nine (or seven) GHMs showed that WaterGAP is one of the best fitting models (Gudmundsson et al 2012, Tallaksen andStahl 2014).
LPJmL (Rost et al 2008, Schaphoff et al 2013 is a dynamic global vegetation model representing the growth and productivity of nine natural plant functional types and 13 crop types (including pasture), whose inter-and intraannual dynamics is computed in full coupling with the carbon and water cycle (including runoff, soil moisture, evapotranspiration and percolation through several soil layers with the outflow from the lowest layer representing groundwater recharge). Irrigation demand and associated water withdrawal on areas equipped for irrigation is computed based on hydrological and vegetation physiological process representations. These processes include the direct coupling of transpiration and CO 2 assimilation, such that changes in atmospheric CO 2 concentration translate into a reduction in transpiration at leaf level. If increases in the water use efficiency through rising CO 2 concentration is strong enough to allow for areal expansion of natural vegetation (crop areas are fixed), transpiration may also increase in some regions as a net result. Effects on the water balance may also occur in response to climate change-driven changes in vegetation types and growing seasons. Human water withdrawals for sectors other than irrigation (taken from WaterGAP) are assumed to be met prior to agricultural water withdrawal. Modelled streamflow is not calibrated (as opposed to WaterGAP) such that the model bias might be larger in some river basins, yet earlier studies demonstrated overall good validation results not only for streamflow and water withdrawals (Biemans et al 2009, Jägermeyr et al 2017 but also for non-hydrological features such as crop yields and biogeochemical processes (Schaphoff et al 2018).

A2. Determination of significance
In a first step, the two-sample Kolmogorov-Smirnov (KS) test was applied to determine whether the hydrological hazards in the 2 • C world are, in a statistical sense, significantly different from hazards in the 1.5 • C world, independent of the sign of HI change. For each HI, GW level, model combination and 0.5 • grid cell, 20 HHI values (i.e. percent changes of the HIs) were computed from the 10 year output of each of the 20 ensemble runs. Absolute values of these 20 HHIs were considered to form a probability distribution of the specific hydrological hazard that the grid cell would suffer from in either a 1.5 • C or a 2 • C world. The null hypothesis of the KS test is that the distribution of the 2 • C HHI and the distribution of the 1.5 • C HHI are drawn from the same probability distribution. A rejection of the test's null-hypothesis at a significance level of 90% is taken as a robust difference between the specific hydrological hazards at the two warming levels for the tested model combination. Evaluating the information provided by the eight model combinations that may result in strongly different HHI values, we defined three conditions that had to be fulfilled to call the difference between the HHI in the 1.5 • C world and the 2 • C world significant in this study. (1) A significant difference is identified by the KS test for at least five of the eight model combinations, (2) at least six model combinations agree that the hazard, determined by averaging the absolute HHI values of the 20 runs, is larger in the 2 • C world than in the 1.5 • C world (or vice versa), and 3) both the ensemble mean of the ratio |HHI 2 • C |/|HHI 1.5 • C | and ratios of the majority of ensemble members agree that the ratio is larger (or smaller) than 1. Condition 1 guarantees that not only one GHM results in significant differences according to the KS test, while condition 2 ensures that only one GCM may not agree with the others on which GW level leads to the larger hazard. Condition 3 takes care of the rare situation where the ensemble mean (on which the presentation of results focuses) is distorted in the opposite direction by one or two disagreeing model combinations. If, for example, six model combinations agree that hazard is larger for 2 • C GW, while the ensemble mean (of eight model combinations) indicates that the opposite is true, the ensemble mean result would not be considered to be significant.
To determine whether the hydrological situation in the two future worlds differs significantly from historical conditions, the 20 HI values for historical and GW level conditions were subjected to the KS test for each model combination. Similar to above, the HHIs for each warming level were considered to be significant if (1) at least for five of the model combinations the KS test was positive, (2) at least six model combinations agree on the sign of the HHI, i.e. on the direction of change and (3) the direction of change of the ensemble mean agrees with the direction of change of the mean of model combinations that agree on the direction of change.

A3. Data related to exposure and vulnerability
Exposure to the freshwater-related hazard is quantified by human population numbers. To obtain population in 0.5 • grid cells in 2010, the 2010 GPWv3 gridded population estimate for the year 2010 (CIESIN 2010) was aggregated from its original resolution of 2.5 arcminutes to 0.5 • grid cells. Population in 2100 was computed by scaling the 2010 grid values with country totals of the 'Middle of the Road' scenario SSP2 (SSP Database at https://tntcat.iiasa.ac.at, Jones and O'Neill 2016), neglecting changes in population distribution within countries. Vulnerability and coping capacity is taken into account by aggregating results for four country groups that are composed by countries with similar per-capita gross national income, using the April 2017 World Bank classification of countries into low income, lower middle income, upper middle income and high income countries (World Bank 2017). Figure B1. Same as in figure 3 but land area does not refer to the total global or country group areas but to the land area for which each of the HHI was computed (see section 2.2).