An innovative approach to the assessment of hydro-political risk: A spatially explicit, data driven indicator of hydro-political issues

Highlights • Data driven spatially explicit index of hydro-political issues magnitude.• Estimation of the non-linear interactions between factors determining water issues.• Increasing climate change and population are likely to boost hydro-political issues.

Competition over limited water resources is one of the main concerns for the coming decades. Although water issues alone have not been the sole trigger for warfare in the past, tensions over freshwater management and use represent one of the main concerns in political relations between riparian states and may exacerbate existing tensions, increase regional instability and social unrest. Previous studies made great efforts to understand how international water management problems were addressed by actors in a more cooperative or confrontational way. In this study, we analyze what are the pre-conditions favoring the insurgence of water management issues in shared water bodies, rather than focusing on the way water issues are then managed among actors. We do so by proposing an innovative analysis of past episodes of conflict and cooperation over transboundary water resources (jointly defined as "hydro-political interactions"). On the one hand, we aim at highlighting the factors that are more relevant in determining water interactions across political boundaries. On the other hand, our objective is to map and monitor the evolution of the likelihood of experiencing hydro-political interactions over space and time, under changing socioeconomic and biophysical scenarios, through a spatially explicit data driven index. Historical cross-border water interactions were used as indicators of the magnitude of corresponding water joint-management issues. These were correlated with information about river basin freshwater availability, climate stress, human pressure on water resources, socioeconomic conditions (including institutional development and power imbalances), and topographic characteristics. This analysis allows for identification of the main factors that determine water interactions, such as water availability, population density, power imbalances, and climatic stressors. The proposed model was used to map at high spatial resolution the probability of experiencing hydro-political interactions worldwide. This baseline outline is then compared to four distinct climate and population density projections aimed to estimate trends for hydro-political interactions under future conditions (2050 and 2100), while considering two greenhouse gases emission scenarios (moderate and extreme climate change). The combination of climate and population growth dynamics is expected to impact negatively on the overall hydro-political risk by increasing the likelihood of water interactions in the transboundary river basins, with an average increase ranging between 74.9% (2050population and moderate climate change) to 95% (2100 -population and extreme climate change). Future demographic and climatic conditions are expected to exert particular pressure on already water stressed basins such as the Nile, the Ganges/Brahmaputra, the Indus, the Tigris/Euphrates, and the Colorado. The results of this work allow us to identify current and future areas where water issues are more likely to arise, and where cooperation over water should be actively pursued to avoid possible tensions especially under changing environmental conditions. From a policy perspective, the index presented in this study can be used to provide a sound quantitative basis to the assessment of the Sustainable Development Goal 6, Target 6.5 "Water resources management", and in particular to indicator 6.5.2 "Transboundary cooperation".

Introduction
Future availability of freshwater for human consumption under a changing world represents one of the main concerns of the current political debate. Water crises have been placed among the major risk factors for the coming decades by the Global Risks Perception Surveys conducted by the World Economic Forum between 2015 and 2017 (WEF, 2017(WEF, , 2016. Increasing demographic pressure, environmental degradation, and climate change impacts on water spatio-temporal distribution represent the largest determinants of current and future water related issues. Although it is intuitive that water stress is likely to increase the competition over water (Malthus, 1798), it is not completely clear how the combinations of factors influencing water demand and availability alone could lead to such different outcomes in different watersheds spread around the planet. Evidence shows that the consequences of comparable levels of physical water stress have been handled unevenly in different geographical areas and historical contexts . Socioeconomic and cultural characteristics (Wolf, 2009), jointly with topographic factors (Beck et al., 2014;Gleditsch et al., 2006;Munia et al., 2016), were identified as the drivers more likely influencing hydro-political dynamics. Resource scarcity is likely to increase tensions, especially when associated with socio-cultural stressors (Sirin, 2011), but, on the other hand, the lack of a vital resource as water is also likely to boost cooperation between actors sharing the same freshwater sources (Bernauer et al., 2012b;Wolf, 2009Wolf, , 2007Wolf et al., 2003). The literature hardly identified common features between countries involved in water issues: similar levels of tension over water arose between countries independently of their climate zone, population size, territorial extension, level of democracy (Wolf, 2009). Moreover, the same international water issue frequently resulted in episodes of conflict and cooperation at the same time (Gerlak and Zawahri, 2009;Kalbhenn and Bernauer, 2012;Wolf, 2009;Wolf et al., 2003;Yoffe et al., 2004;Zeitoun et al., 2011;Zeitoun and Mirumachi, 2008). Although several cases of tensions, mostly non-violent, were also recorded, the literature shows that water related issues are more likely to be resolved with cooperation between the countries sharing the transboundary basins (De Stefano et al., 2010b;Wolf, 2009Wolf, ,2007Wolf et al., 2003;Yoffe et al., 2004Yoffe et al., , 2003. Analyzing historical events, Böhmelt et al. (2014) concluded that physical availability and water demand components are only part of the aspects to be considered for the analysis of water related issues. The literature about political science, geopolitics, and diplomacy showed that also socioeconomic factors, jointly with institutional capacity, legal framework, and cultural background influence the diplomatic interactions between countries or actors sharing resources (Bernauer et al., 2012b;Wolf, 2009;Zeitoun et al., 2011) ( 1 ).
The goal of this study is to design an empirically based index aimed at analyzing and mapping the interactions between biophysical and socioeconomic factors linked to water issues at global scale. This was done analyzing water availability and demand, as well as socioeconomic, institutional, legal, and cultural context: factors that are likely to influence transboundary water issues. Final goal is to provide the policy maker with an instrument able to capture historical and current determinants of water related issues, but also the possibility to construct scenarios and simulate sets of policy options. The hereby presented index was calculated by applying a machine learning model on data layers at detailed spatial resolution for the assessment of water related issues and their determinants in the interactions between countries in transboundary basins.
1.1. Assessing the factors influencing water cross-border issues 1.1.1. From water conflict and cooperation events to water interactions Political debate at the highest level had often expressed the concern for an increasing number of violent conflicts related to water resources use and appropriation, in particular in the cases of transboundary basins. Such concern brought to the inclusion in Agenda 2030 of a specific indicator on "Proportion of transboundary basin area with an operational arrangement for water cooperation" 2 (6.5.2), together with "Degree of integrated water resources management implementation" (6.5.1), for the assessment of Target 6.5 "Water resources management". Nevertheless, the analytic evidence of the correlation between violent conflicts and climatic factors is not completely clear (Buhaug, 2010;Kallis and Zografos, 2014;Zeitoun and Mirumachi, 2008), and thus the need emerges for methods oriented to pursue a scientifically sound and quantitative assessment of available information, as the one proposed herein. The literature found a strong correlation between temperature (Burke et al., 2009), or drought events (Couttenier and Soubeyran, 2014), and civil war episodes in Africa. Buhaug (2010) firmly contested these findings and found the conflicts to be explained by structural and contextual conditions, such as: exclusion of ethnical groups from the political context, poor economic management, and geopolitical dynamics. Hsiang et al. (2011) proposed a meta-analysis based on 60 studies focusing on 45 historical conflicts on a global scale concluding that temperature and rainfall variability are significantly connected to violent events. Water related issues follow different dynamics respect to civil conflicts: historical water crises were often resolved with more or less satisfactory, formal or informal, agreements between the parties (De Stefano et al., 2010b). Water conflicts in history are, in fact, peripheral events and none of them reached a formal declaration of war Kalbhenn and Bernauer, 2012;Katz, 2011;Wolf, 1998, 2007, Yoffe et al., 2004. The fact that water war episodes were not recorded in the past does not imply that this could not happen in the future (Kallis and Zografos, 2014). Water related disputes were sometimes identified as igniting factors exacerbating international issues of different nature (Wolf, 2009). On the other hand, cooperation over transboundary basins often resulted in a benefit multiplier opportunity, associated with lower costs, increasing benefits and possibility for cooperation beyond water (Sadoff and Grey, 2002). In the analysis of historical hydro-political events, research points out that certain degrees of conflict and cooperation coexists in the same water related event (Gerlak and Zawahri, 2009;Kalbhenn and Bernauer, 2012;Wolf, 2009;Wolf et al., 2003;Yoffe et al., 2004;Zeitoun et al., 2011;Zeitoun and Mirumachi, 2008). For this reason, some authors (in particular Zeitoun and Mirumachi, 2008) claimed it would be more appropriate to analyze the transboundary water interactions, conflict and cooperation dynamics within the same water issue, regardless of their nature (Kallis and Zografos, 2014;Watson, 2015;Zeitoun and Mirumachi, 2008). In the proposed study, this approach was adopted focusing on the historical water interactions, rather than on the specific conflict or cooperation events linked with each of the water related transboundary issues, and use this as an indicator of the hydro-political risk, not intended as conflict risk, but rather risk of experiencing water related issues. As specified in Kalbhenn and Bernauer (2012), each water case underlying the interactions is defined as a water management issue that manifests in multiple interrelated interactions. For instance, the construction of a dam could represent a water case, while the protests of the downstream countries, of the affected stakeholders, the negotiations, and a possible international agreement would represent a series of events (conflict and cooperation) related to the specific case of the construction of our dam. Following Wolf et al. (2003 and2009), conflictive and cooperative events were defined water interactions. In this paper, we will refer to the water interactions irrespectively of their specific nature and to more generic water issues or cases, defined as the water management aspects determining the interconnected water interactions, as for Wolf et al. (2003) andYoffe et al. (2003). The likelihood of having water interactions is an indicator of the complexity of the underlying water issue that, if not properly and promptly addressed by the actors involved, could eventually increase the hydro-political risk. Therefore, our index should then be interpreted as a measure of the magnitude of water issues between specific actors in a specific basin. The rationale behind this is the following: if there are interactions about shared water resources in a specific basin, both in the case of tension or cooperation events, there is a water allocation/ management/quality issue. Therefore, the fact that a water management issue leads to cooperative or conflictive behaviors is unrelated to the nature of the water issue itself. It attains more to the political, cultural, institutional, and socioeconomic conditions of the actors involved. The presence of a water issue is in itself an indicator of risk: it is a necessary condition for having water interactions and a not sufficient condition for either conflict or cooperation over water or both. Some water interactions end up being conflictive, some others cooperative, but all imply the existence of a water issue. This study focuses on the analysis of the probability of having water issues, their intensity, and their determinants: the necessary conditions for ensuing water tensions or cooperation. The analysis of the factors that makes the water issue being managed with a more confrontational or cooperative approach by the actors involved falls outside the scope of this research.

Determinants of cross-border water issues
Economic, statistic and game theory approaches have been used to analyze international dynamics over transboundary waters (Dinar, 2004). Some studies analyzed the dynamics of conflict and cooperation, here defined water interactions Böhmelt et al., 2014;Brochmann and Gleditsch, 2012;De Stefano et al., 2012, 2010bWolf, 2007;Wolf et al., 2003); other focused on the likelihood of reaching bi-or multi-lateral agreements between countries (Dinar et al., 2011;Espey and Towfique, 2004;Zawahri and Mitchell, 2011); additional analyses used the existence of treaties and River Basin Organizations (RBO's) as proxy to quantify the institutional resilience toward potential hydro-political tensions (De Stefano et al., 2017Petersen-Perlman, 2016). The likelihood of cooperating and finally reaching water agreements is influenced by time-invariant factors, as for geographical and topographic characteristics, and time-varying correlates, as for climatic variables and socioeconomic characteristics. Quantitative analysis was used to find the causal relations leading to conflicting or cooperative interactions and the formation of treaties. Wolf ( , 2007Wolf ( , and 2009) underlined the central role of the quality, stability and strength of the institutions, highlighting the need for stronger institutional frameworks to cope with future challenges . Zawahri and Mitchell (2011) argued that the formation of treaties is a by-product of state interest, transaction costs, and distribution of power. Dinar et al. (2011) analyzed the main reasons why some treaties would be more likely discussed in some basins relative to others. They found that scarcity and cooperation follow an inversed U-shaped curvilinear relation: cooperation is higher when water scarcity is moderate, instead of very high/low (also in Dinar et al., 2010). Extreme scarcity situations were found to be inhibiting factors (Dinar et al., 2011). Institutional stability and effective past agreements oriented toward a fair and efficient water allocation between riparians were found to be cooperation boosting factors (Dinar et al., 2015). These and other studies (Beck et al., 2014;Brochmann and Gleditsch, 2012;Espey and Towfique, 2004) found evidence of the influence of economic factors, trade dependency, virtual water trade, presence of water infrastructures, quality of the institutions, governance, presence of supra-national authorities, cultural background, on the bi-and multi-lateral relations between the countries facing allocation, management, and pollution problems over shared water. A large part of these analyses highlighted the non-linear nature of the relations between water interactions and correlated factors.
In this study, we propose for the first time the use of a machine learning approach to quantitatively assess the linear and non-linear relations between the hydro-political interactions recorded and the time-varying and time-invariant biophysical, topographic, and socioeconomic explanatory variables. We aim at combining information at transboundary river basin level with gridded data into an empirically based data driven index. A similar objective was pursued in the AQU-EDUCT Water Risk Atlas developed by the World Resources Institute (WRI) (Gassert et al., 2014. AQUEDUCT did not specifically refer to hydro-political risk, but rather to a global database of 12 main indicators about water quantity, quality, and regulatory framework, from about 15000 basins from all over the world that, once aggregated, formed a composite index defined as overall water risk . Similar gridded approach was used to calculate the Global Water Security Index (GWSI), an index based on information about water availability, accessibility, quality and management, aggregated through spatial Multi Criteria Analysis (Gain et al., 2016). Other examples exist at basin level spatial resolution, such as the Transboundary Waters Assessment Programme (TWAP) project (UNEP-DHI and UNEP, 2016). The hydro-political tension component in TWAP is part of the overall Governance indicator. This is based on three sub indicators: 1) Legal Framework, 2) Enabling Environment, and 3) Hydro-Political Tensions. The first is based on the rationale that governance of transboundary basins is driven by the existence of bi-or multi-lateral treaties regulating interactions between the countries. Legal Framework is based on the presence in the treaties of the following principles: (a) "equitable and reasonable utilization; (b) not causing significant harm; (c) environmental protection; (d) cooperation and information exchange; (e) notification, consultation or negotiation; (f) consultation and peaceful settlement of disputes" (quoted from UNEP-DHI and UNEP, 2016). The coverage of all the legal principles by the previous treaties, jointly with the ratification of the UN WC Convention and/or UNECE Water Convention by the countries involved, is considered a factor reducing risk. The Enabling Environment attains to the single countries' capability of planning, regulating, managing, and governing water resources (UNEP-DHI and UNEP, 2016). The level of Hydro-Political Tension is obtained combining the institutional vulnerability with planned infrastructural development, where institutional vulnerability is higher in case the riparian countries did not specifically regulate in a formal treaty water allocation and management of flow variability, in case they did not agree on a conflict resolution mechanism, and in case the basin is not administrated by a RBO (UNEP-DHI and UNEP, 2016). The indicator was designed assigning a score to specific sub-indicators, then aggregated and ranked, following the methodology developed in existing literature (De Stefano et al., 2012, 2010b, 2010a. It is based on information derived from the water treaties database (International Freshwater Treaties Database -IFTD) (De Stefano et al., 2010b) created within the Transboundary Freshwater Disputes Database (TFDD) . This work was then further developed in De Stefano et al. (2017). In this updated version, the current institutional resilience of the transboundary basins was calculated as a function of existing treaties and river basin managing institutions (RBO's), similarly to the methodology used in the TWAP project (De Stefano et al., 2012;UNEP-DHI and UNEP, 2016). The hydro-political vulnerability of the basins was then quantified putting in relation with the institutional resilience destabilizing factors, such as planned infrastructural development, and the exacerbating factors, such as low income, climate driven water variability, reservoir depletion, armed internal or international conflicts, past water disputes through a multi-criteria analysis (De Stefano et al., 2017). The results were produced at basin level: thirty-six river basins were classified within the high and very high categories of hydro-political risk.
We propose a different, somewhat complementary, approach combining the information at transboundary basin level with local scale gridded data processed in an empirically based model designed to take into account linear and non-linear combinations between biophysical and socioeconomic stressors and international water interactions. In a second step, rather than assigning scores and aggregating sub-indicators in ranked relative risk categories, we used the model fit with past observations to construct a baseline and future projected scenarios. Similarly to other approaches described in this section, our index combines information at country level with gridded data, but, unlike previous approaches, our outcome variable is computed at gridded resolution. This makes the hereby proposed index spatially explicit and completely data driven.

Methodology and data
The empirically based analysis was designed upon concepts derived from political science and environmental economics, with a set of indicators selected covering information about: river basin freshwater availability; climate stress; human pressure on water resources; socioeconomic conditions, including institutional development and power imbalance; and topographic characteristics. A tool derived from machine learning, the Random Forests regression algorithm (Breiman, 2001), was used to estimate the relations between the indicators from each of the groups with observed water interactions. The relative impact of each time-varying and time-invariant indicator was in this way assessed and empirically estimated using the water related events database International River Basin Conflict and Cooperation -IRCC (Kalbhenn and Bernauer, 2012). The Random Forests regression model was trained based on historical information covering an eleven years period (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007). Medium term mean (1997-2012) of the selected indicators at high spatial resolution (0.25 degrees) was then used to estimate the spatial distribution of the likelihood of experiencing hydropolitical interactions (baseline scenario). Future scenarios of 2050 and 2100 were calculated by using the multi-model mean of the daily temperature and precipitation estimates from 5 GCM's belonging to the Coupled Model Intercomparison Project Phase 5 (CMIP5) (Taylor et al., 2012), considering two different emission and radiative forcing scenarios, Representative Concentration Pathways (RCP) 4.5 and 8.5 (Meinshausen et al., 2011(Meinshausen et al., , 2009, for the 15 years period before the reference time (respectively, 2036-2050 and 2086-2100).

Data
Data about historical water interactions are the basis for hydropolitical studies. Two main global dyadic databases of historical water related events are currently available: the Transboundary Freshwater Dispute Database (TFDD) International Water Event Database (IWED) developed by the Oregon State University with the Basins at Risk project Yoffe et al., 2003Yoffe et al., , 2004and later updated in De Stefano et al., 2010b) 3 , providing information about international water basin interactions between 1948 and 2008; and the International River Cooperation and Conflict database (IRCC), reporting water related issues between 1997 and 2007 (Kalbhenn and Bernauer, 2012). Both databases are set up in the form of water related events at dyadbasin level. Each national territorial unit in a specific river basin is defined as a basin-country unit (BCU), each of the possible pairs of BCU's in the same basin are classified as a dyad. Although the temporal coverage (11 years) is limited, the IRCC database was preferred in this analysis for the higher number of non-neutral interactions reported (4797 -IRCC vs 1985 -TFDD) (Kalbhenn and Bernauer, 2012), and for the data collection methodology coded from a homogeneous set of information . The dyadic characterization of the database, with a geographical scale limited to bilateral country interactions for each transboundary basin, represents a limiting factor for a detailed spatial analysis of the biophysical and socioeconomic drivers determining the national and international water related issues. Moreover, due to the nature of the algorithms used for the creation of the database -mining water coded events from international news datasets -the event data are characterized by an uneven geographical distribution of the observations. More details and alternative water interactions databases are presented in the Annex A.
The hydro-meteorological information used in this analysis were derived from the highly spatially detailed climate data from the Multi-Source Weighted-Ensemble Precipitation (MSWEP) database (Beck et al., 2017). We calculated a precipitation anomaly indicator based only on variation in the temporal distribution of precipitation: the Standardized Precipitation Index (SPI) (McKee et al., 1993). This climate proxy, measuring rainfall anomalies, is widely used for drought quantification and monitoring (WMO, 2012), (details in Annex A). Temperature data were derived from the WATCH Forcing Data methodology applied to ERA-Interim (WFDEI) dataset (Weedon et al., 2014). Water availability was assessed using a modified version of the Falkenmark Water Stress Indicator (Falkenmark and Lannerstad, 2005), considering also the water resources flowing from upstream, calculated using the 0.1 degrees resolution LISFLOOD global hydrological model (De Roo et al., in preparation). River basin topographic data used for the analysis were mainly represented by the river flow accumulation, proxy for the upstream/downstream relations, and the share of national territory in the basin (Beck et al., 2014).
Gross Domestic Product (GDP) statistics were derived from Gleditsch (2002). The Governance indicator was calculated as mean value of the six indicators (voice and accountability; political stability and absence of violence; government effectiveness; regulatory quality; rule of law; control of corruption) of the Worldwide Governance Indicators (WGI) project (Kaufmann et al., 2010). Agriculture (share of GDP) and rural population (share of the total) were derived from the World Development Indicator database (World Bank, 2018.). Population dynamics were derived from the Gridded Population of the World (GPW, v4) database (CIESIN, 2015) downscaled by the EC Joint Research Centre (Freire and Pesaresi, 2015). Political and military importance of the countries was represented in the model through the Composite Index of National Capability (CINC) derived from the National Material Capabilities (NMC v5.0) database within the Correlates of War project (CoW) (Singer et al., 1972) 4 . CINC is calculated as a share of the world power as function of six variables, namely: total population, urban population, iron and steel production, military expenditure, military personnel, and primary energy consumption (Singer et al., 1972). The information about past bi-or multi-lateral water treaties were derived from the International Freshwater Treaty Database -IFTD (Oregon State University, Transboundary Freshwater Dispute Database TFDD) 5 (De Stefano et al., 2012).
The climate projections data used in this study belong to the NASA Earth Exchange Global Daily Downscaled Projections (NASA NEX-GDDP) dataset downscaled (0.25 degrees) and bias corrected using the Bias-Correction Spatial Disaggregation (BCSD) methodology described in Thrasher et al. (2012). Due to computational constraints, we selected 5 out of the 21 climate models included in the NASA NEX-GDDP (details in Annex A), chosen on the basis of the structural differences among them, as described in Knutti et al. (2013).
Population density for the years 2050 and 2100 were estimated applying the country specific population growth rates estimated by the World Population , 2015). Main statistics and variable description are summarized in Annex A (Table A1); further information about data sources could be found in Table A2.

Methodology: random forests regression
Different methodologies have been used in literature to analyze dyadic data. Most of them were not designed to capture the non-linear interactions. For this reason, in this work we propose a different approach applying the Random Forest (RF) regression algorithm (Breiman, 2001). RF is a Classification and Regression Tree (CART) based tool that involves an ensemble of regression trees. These are calculated on random subsets of data randomly split in base of specific features of each of the independent variables (Liaw and Wiener, 2002;Strobl et al., 2009;Welling et al., 2016). RF is based on the decision trees learning approach popular for non-linear multi-variate classification and regression (Breiman, 2001;Tin Kam Ho, 1998). In this study, we will refer to the RF regression, which is slightly different from the classification algorithm and is structured in four subsequent steps described below (RF algorithm logical steps, calibration, and validation procedures are summarized in Annex B).
RF Model training: the model was used to find the linear and nonlinear relations between the dependent variable, a logarithmic transformation of the number of water interactions for each of the countrydyad/basin combinations observed for the 11 years (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007) available data, and the 19 independent variables selected for the analysis (variable selection was performed optimizing the model performance as described in Annex B). Out of the 19 variables used in the final specification of the model (see Table A1 for details), the 2 representing the basin's topography were time invariant, while the remaining 17, representing biophysical and socioeconomic factors were time varying (one is the time trend control). The interpretation of the modeling set up should be intended as the relation between a percent variation of the objective variable (the measure of the intensity of water issues), in response to the variation of absolute values of the independent variables. Since the relations, in the majority of the cases, are non-linear, by manipulating an independent variable, the variation of the objective variable could be positive or negative depending on the values of the remaining set of independent factors.
Baseline: the RF model set up in the previous step was used to construct a baseline (or reference) scenario of the likelihood of hydropolitical issues at grid-cell level (each cell has dimensions 0.25 × 0.25 degrees, approximately 27 × 27 km at the equator). In order to reduce the bias derived by climate variability and possible temporary shocks in the specific independent variables, the baseline scenario was calculated by averaging the values of the independent variables for the period 1997-2012 at grid-cell level. Variables' values at grid-cell level are cell specific (as for the 8 climatic variables 6 ; population density 7 ; and water availability 8 ) or the same for all the cells of a country (as for all the socioeconomic variables 9 ). The production of a baseline or reference scenario results in the possibility to map the spatial distribution of the likelihood of having water interactions, our index, at global level, upon present conditions of the factors determining water interactions.
Projections: using a procedure similar to the one described for the baseline, the model was used to map the variations on the objective variable as a response to four possible future climate and population scenarios. The future conditions are based on climate projections to the years 2050 and 2100 based on two different degrees of climate change (RCP 4.5moderate climate change scenario; and RCP 8.5severe climate change scenario). In order to reduce the bias derived from the specific climate modeling exercises, we averaged in a multi-model mean climate projections from 5 GCMs downscaled and bias corrected. Climate projections were combined with population growth scenarios at grid-cell resolution, calculated applying to the baseline population density (CIESIN, 2015;Freire and Pesaresi, 2015), country specific population growth rates for the years 2050 and 2100 (UN/DESA, 2017).
Comparison of the future and baseline scenarios to assess the change in the index caused by population and climate dynamics.

Random forest model results
The RF model was trained using the entire set of observations (N = 11801). Each of the observations reports the logarithmic transformation of the number of hydro-political interactions for a specific dyad of countries (749 country dyadic combinations considered in the final panel) in a specific river basin (260 transboundary basins included) for a specific year (11 years). Of the final 11801 observations considered, 10062 reported no water interactions, while 1739 at least 1 interaction in the combination BCU/year 10 . The overall RF model was found to explain about 70% of the variation (pseudo R 2 , details available in Annex B). Variable importance estimates for the RF model highlighted that socioeconomic variables play the most important role. Population density was the variable that mostly influenced the capability of the model to capture the variation of the set of observations taken into account in this analysis. Time trend control resulted to be the second most important variable in capturing the variability of the data: this is likely due to the data collection algorithm of the hydro-political event dataset strongly influenced by the increasing of news availability in the period under consideration, coincident with internet development. The upstream/downstream dynamics (represented by the flow accumulation), jointly with territorial (area difference) and power imbalance (Composite Index of National Capability -CINC) follow the population dynamics. Per capita water availability (Falkenmark Index) was reported as the most important of the biophysical variables, while variables associated with precipitation and temperature follow in the mid-lower portion of the permutation-based variable importance ranking (Fig. A3).
The performed analysis highlighted the non-linear nature of the relations between certain variables and their impact on the hydro-political interactions (further details in Annex B; partial dependence plots in Fig. A4). The model finds an increasing inverse U-shaped relation between population density and water interactions: sparsely populated areas were associated with a lower probability of having water issues; the likelihood increases till reaching its maximum at about 100 people km −2 . Above this value the relation decreases remaining positive and leveling to zero for values above 400 people km −2 . Almost opposite results are found for the Falkenmark Index, indicating per capita water availability including the amount of resources flowing from upstream: in the areas where the water availability is the lowest, increasing values are associated with a marginal decrease in the likelihood of water issues. The slightly negative relation is however non-linear: it is positive in areas where relatively more water is available and almost negligible in water abundant areas. Relative territorial supremacy on the basin (difference in the national territory in the shared watershed) was found to have an inverse U-shaped relation: the likelihood of water interactions appears to be very low among actors occupying similar territorial extensions of the shared river basin; similar conclusions could be drawn for countries occupying the majority of the basin territorial extensions, 6 TOT_Precip, MIN_Precip, SPI_12, AVG_Temp, TempMAX, TempMin, Temp_delta, Temp_seasonal_var in Table A1. 7 Pop_density in Table A1. 8 Falkenmark_upst in Table A1. 9 Rural_pop, GDP, Agriculture, Governance_ind, cinc_mean, IFTD_treaties in Table   A1.
10 With a maximum of 166 interactions between Hungary and Romania in the Danube river basin in the year 2000. while hydro-political interactions are found to be more likely in the middle cases. Low to medium levels of national power (composite index of national capability) were found to be associated with higher likelihood of experiencing water interactions. Very upstream and very downstream countries are found to be more likely to get involved in water interactions. Rural and agricultural dependent economies and, in general, lower to middle income countries are more prone to experience water issues.

Model findings discussion
Socioeconomic and water demand side factors are found relatively more important in determining hydro-political interactions respect to supply side factors like shocks in precipitation or other climatic variables. Similar findings were highlighted in Böhmelt et al. (2014) where population pressure, agricultural productivity, and in general economic development were identified as important determinants for the formation of water disputes, mitigated into cooperative interactions in case of solid institutions and stable political conditions. Population dynamics were found important drivers in other studies. Brochmann and Gleditsch (2012), among others, found that countries characterized by very large or very small population are more likely to get involved in conflicts over water. In our case, population density is a proxy of human pressure over water resources, but population is also linked with the power of a nation and its economic and socio-political capabilities. Very low densely populated areas were found to be less likely to experience water interactions, but in case of rural communities (> 50% of the population living in rural areas) extremely dependent on agricultural productivity (> 30% of the GDP) for their economic development, the combination of the three factors was found likely to experience water issues (a 3-D dependence plot available in Fig. A5).
Increasing population density, by increasing human pressure on a limited set of resources, was found likely to increase the probability of experiencing water related issues, but this relation was found to be not linear. This could be explained considering the role of hydraulic infrastructures in mitigating water stress in densely populated areas (McDonald et al., 2014), and the extreme consequences when the capacity of the water infrastructures is no longer sufficient to cope with climatic variability and population growth. Similarly, in Dinar et al (2011) increasing human pressure on water resources, determining water scarcity, was found to have an inverse U-shaped relation with cooperative hydro-political interactions, while extreme cases were more associated with tensions. The inverse U-shaped relation between per capita water availability and likelihood of hydro-political interactions confirms also the conclusions of Dinar et al (2010), that found cooperative water interactions more likely in situation of average water availability. Territorial and power imbalance were found significant drivers of hydro-political interactions in the main literature available (Brochmann and Gleditsch, 2012;Gleditsch et al., 2006;Zawahri and Mitchell, 2011). This study's findings about upstream/downstream dynamics confirm the accurate study performed by Munia et al. (2016) quantifying the increasing water stress in the downstream part of the basins due to upstream uses, and its connection with increasing water tensions. Our results found an increasing trend of water related interactions over time. On the one hand, the institutional development brought an increasing collaboration over water related international issues (De Stefano et al., 2012;Dinar et al., 2015;Kalbhenn and Bernauer, 2012;Wolf, 2009). On the other hand, the trend is (at least partially) explained by the increasing coverage of the international press industry of the local news about water issues. The way water event datasets were developed, in fact, is strongly influenced by the publication of news in the main western languages: this sector has been radically influenced by the digital revolution. As noted in De Stefano et al. (2010b), the scarce representation of some areas of the world in the water related events datasets is mainly due to the fact that the search was performed analyzing international and local news in English. This methodology proved to be rather unsuccessful in capturing information published in local languages or news from area not completely covered, such as war zones or politically or technologically isolated countries. For this reason, data about historical water related events represent the main limitation of the studies in this specific field.

Model application to calculate the likelihood of hydro-political interactions under current and upcoming conditions
One of the main objectives of this study was to draw a spatially explicit data driven index aimed to help the policy makers in monitoring the dynamics of the factors identified as influential in determining water related issues, and in identifying the areas where cooperation over water is more needed to timely address criticalities that could eventually lead to water disputes. In order to achieve this objective, we calculated the medium-term mean (1997-2012, when available) of the selected indicators at the highest spatial resolution allowed by data availability (0.25 degrees), and we used the estimated RF model to draw the spatial distribution of the likelihood of hydropolitical interactions. Not all the variables were available at subcountry resolution, in particular: 10 variables were available at grid-cell level; 5 at country level; 3 at country/basin level (more details in Table  A1). The spatial distribution of the index within the country borders is therefore driven by climatic, population, and water availability drivers: an unavoidable simplification caused by the limited availability of data at sub-country and gridded level, partially compensated by the fact that the spatial distribution of some variables, as for the national capability (CINC), can be considered fairly homogeneous at intrastate level. A high likelihood of hydro-political interactions identifies the areas where water issues are more probable to raise. Although this index does not give information about the degree of cooperation or conflict associated with the specific interaction, it identifies the areas of possible hydropolitical risk that would be best addressed through a cooperative action (Fig. 1). The index was calculated at pixel level, the values attributed to each specific basin is the average of all the pixels within its boundaries. To ensure the comparability of the different variables and indicators, the corresponding values were normalized across the transboundary basins in a 0-1 range through a simple min-max normalization procedure. High values of the likelihood of hydro-political interactions are linked with a larger water stress, due to lack of water supply and/or human pressure in a more vulnerable institutional and socioeconomic context. The spatial distribution of the index highlights the areas where it could be more likely to experience issues related to water resources. High likelihood of water related issues could be determined by potential water scarcity in densely populated areas, as in the case of the Nile Delta, one of the basins that reach an high average value of the index (score 0.761). Socioeconomic, political conditions and distribution of water resources determine the differences in the index for the Upper Nile. A combination of low governance, high population density, physical water stress, and almost complete economic dependency on agricultural activities, shaped the distribution of the index in the Ganges-Brahmaputra (highest in our ranking, score 1.000), and Indus basins (score 0.675). A different climatic area, more pronounced precipitation stress, with a lower population density and lower economic dependency on agricultural production characterized the results for the Euphrates-Tigris river basin (score 0.592). Population density, high economic dependence on agriculture, and human pressure on water resources determine the distribution of the index on the lower Niger (score 0.447), in particular within the borders of Burkina Faso and Nigeria. Population distribution and socioeconomic conditions shape the index in the Congo basin (score 0.432), while a relatively good governance level characterizes the Zambezi river basin, with hotspots in the most populated areas, and increasing values towards the outlet of the basin (overall score 0.431). Human pressure and relatively heterogeneous socioeconomic conditions determine the need for water cooperation in the Mekong basin (score 0.492). Despite the evident progresses made after the EU integration, our results highlight high likelihood of water related issues in specific portions of the Danube basin (score 0.499), especially in the eastern and southern parts, where there is still need to consolidate institutional development and the economic dependency on agriculture still remains relevant. A complete list of the results for the transboundary river basin is available in the Annex D (Table A3).
Some of the areas highlighted in the results shown above (and in Table A3) are well known hotspots for hydro-political issues. Other areas are scenarios of national or international political tensions not directly related with water. Although, given the different nature of our study focusing on water interactions as a measure of the magnitude of water issues, a direct comparison with previous studies aiming to identify basins at risk of future water tensions is not possible, the results of the different approaches are aligned. De Stefano et al. (2017) compared the basins at risk identified using their approach with the ones highlighted in the two previous assessments Wolf et al., 2003). Of the 12 basins found to be more likely to experience water issues in this study (Table A3), 10 are identified as basin at risk in previous analyses, namely: Ganges/Brahmaputra , Pearl/Bei Jiang (De Stefano et al., 2017), Nile , Feni (or Fenney) , Indus Wolf et al., 2003), Colorado , Tarim (De Stefano et al., 2017), Shatt al-Arab -Tigris/Euphrates Wolf et al., 2003), Hari , and Irrawaddy (De Stefano et al., 2017;Wolf et al., 2003). Therefore, the probability of observing hydro-political interactions is to some extent correlated with the hydro-political risk analyses conducted in previous studies identifying basins at risk. That supports the idea that the index proposed herein should be considered for systematic application in support to the assessment of the SDG 6, in particular for what concerns the impacts of future potential biophysical or socio-environmental changes on the likelihood of hydropolitical issues at global scale. The proposed index can also be used to assess interlinkages with other SDGs, and in particular SDG 16 on peace, justice and institutions. In order to achieve a global perspective, our analysis was extended also outside the borders of the international river basins initially included in the analytical framework ( Fig. 2 and Fig. A7 in the Annex). The results outside the boundaries of the international river basins and in the portions of them not or poorly represented in the database of hydro-political events used to fit the RF model, might be affected by certain degrees of error and, that for, should to be considered purely indicative.
The evolution of the index under future climate and population scenarios was estimated for the years 2050 and 2100 considering changes in population density, by applying UN/DESA population growth estimates to the 2015 data, and climate conditions, considering the multi-model-mean of the projected precipitation and temperature for the periods 2036-2050 and 2086-2100 ( Fig. 3 -Additional details in the Annex -Figs. A8 and A9). As mentioned above, population density is among the top drivers determining the likelihood of hydropolitical interactions, while, conversely, climate factors are relatively less important in terms of magnitude, but more relevant in terms of impacted area extent. The reason for choosing the combination of climate and population dynamics as driver for change is motivated mainly by data availability. When alternative scenarios of other important variables and relevant dynamics, as for instance the institutional  . In the radar chart the normalized score of the main factors determining the likelihood in the specific river basins. Not all the variables explicitly used for the model are represented in the radar chart: the non-included factors, however, are derived from the climatic variables displayed. development, will be available, these could be taken into consideration as well.
Changes in population density are expected to exacerbate the anthropogenic pressure on water resources, the availability of which is strongly impacted by changes in climate. The combination of these two factors is estimated to impact negatively on the overall hydro-political risk. The likelihood of water related issues is expected to increase globally, with gridded values averaging between +36.4% in the case of moderate climate change (RCP 4.5) and +37.1% in the case of the more pessimistic climate scenario (RCP 8.5) for the year 2050, and respectively between +39.3 and +46.8% for the year 2100. Aggregating the results for the main transboundary river basins, excluding the areas of the globe not falling in transboundary basins, the likelihood of experiencing hydro-political interactions was calculated to increase on average between 74.9% (2050 RCP 4.5) to 95.3% (2100 RCP 8.5), especially in sub-Saharan Africa, South America, Southern North America, Southern and Eastern Europe, Central and Southern Asia. Table 1 presents the main statistics for the global projections, and the results for the transboundary basins most represented in the original IRCC database that were found to be likely of experiencing more hydropolitical interactions in the future. The convergence of the increasing trends in population density and temperature, jointly with decreasing precipitation is the combination that most influences the future increasing hydro-political risk, as for instance in the case of Southern  Europe, Central Asia, and Middle East (Figs. A8 and A9). Increasing population and temperature were found to be dominant respect to increasing precipitation, as in the case of some tropical areas in sub-Saharan Africa and South-East Asia, in some cases due to the seasonal distribution of the rainfall. Harsher climate conditions were found to offset the benefits derived by decreasing population density, as in the case of North-Eastern China in the second half of the 21st Century.
Only a handful of transboundary basins are expected to benefit or not being impacted by the global climatic and population changes: one in Central Asia, Chuy Basin (from -8% 2050 RCP 4.5 to -37% 2100 RCP 4.5); two in the North of the Scandinavian peninsula: Tuloma Basin (between Russia and Finland, from -3% 2050 both RCP's to +3% 2100 RCP 8.5), and Näätämö basin (at the border between Finland and Norway, -3.2% 2050 RCP 8.5 to +1% 2100 RCP 8.5); and two in Ireland: Bann Basin (-13.4% 2100 RCP 4.5 to +2% 2100 RCP 8.5), and Flurry Basin (-17.4% 2100 RCP 4.5 to -1.6% 2100 RCP 8.5). All these basins are characterized by low population density and, the ones in the northern latitudes, abundant water availability. A detailed list of the projected population and climatic variables, and the estimated results in terms of hydro-political risks are available in the Annex (Figs. A8, A9, and Table A3, respectively).
The increasing pressure that future climate and population dynamics are expected to pose upon the already problematic basins, especially in the Sahelian and Sub-Saharan Africa, Central, South and South-Eastern Asia, should be carefully monitored in order to avoid eventual hydro-political turmoil. In particular, the institutional and governance capacity of the national and supranational institutions (RBO's) should be enhanced in order to minimize the vulnerability of the specific biophysical and socioeconomic basin-systems to the increasing pressure. This aspect could significantly increase the capability of the river systems to deal with the increasing magnitude of change.

Conclusion
In this paper, we presented an innovative analysis of the past hydropolitical issues in international river basins and their determinants through the application of the Random Forest regression algorithm. Our analysis had two main goals: highlighting the factors that are more relevant in determining the hydro-political interactions, capturing also the non-linear relations between the main drivers; and producing a tool able to map and monitor the evolution of the hydro-political risk over space and time, under specific socioeconomic and biophysical scenarios. We did that by designing an empirically estimated, data-driven, and spatially explicit global index of the magnitude of hydro-political issues. The factors that were found to be more relevant in determining hydro-political interactions were mainly represented by, respectively: population density, water availability (quantified through the Falkenmark index), upstream/downstream dynamics (represented by the flow accumulation), with territorial (area difference) and power imbalance (Composite Index of National Capability -CINC), and climatic conditions. Current climatic and socioeconomic conditions were used to design a baseline scenario of the distribution of the likelihood of hydro-political interactions. This output allows to map the spatial distribution of the areas within the basins where water management issues are more likely to rise under current conditions. Among the basins found to be more likely to experience water issues in this study, some were already identified as basin at risk in previous analyses, namely: Ganges/Brahmaputra, Pearl/Bei Jiang, Nile, Feni (or Fenney), Indus, Colorado, Tarim, Shatt al-Arab -Tigris/Euphrates, Hari, and Irrawaddy. The hereby proposed index adds the possibility to identify the most critical areas within the basin boundaries. The baseline scenario was then compared to four distinct climate and population density projections, designed by combining the most updated bias corrected and spatially detailed climate and the most recent estimates of the future population changes. The results of this work allow the identification of the areas where water interactions are more likely to arise under present and upcoming conditions, and cooperation over water should be pursued to avoid possible hydro-political tensions. Future demographic and climatic conditions are expected to heavily increase the probability of experiencing water management issues in already stressed basins, such as the Nile, the Indus, the Colorado, the Feni, the Irrawaddy, the Orange, and the Okavango.
One of the characteristics of the analysis presented is that we chose not to make a distinction between past episodes of cooperation and dispute over water, using them collectively as water interactions, a measure of the magnitude of the associated water issue. This was motivated by the fact that water disputes had virtually never ended in violent conflicts, at least in the most recent historical eras, and by the consideration that the classification of positive (cooperative) and negative (conflictive) interactions in the event databases has often been arbitrary and ambiguous. Our focus was then more oriented towards understanding the preconditions increasing the likelihood of experiencing hydro-political interactions due to emerging water management issues. More than being exhaustive, our approach tends to boost the interest in the hydro-political field of study, by offering a new perspective through the application of a methodology that had never been considered before in this kind of analyses, dealing with aspects that are different by the only institutional resilience, and by exploring the possibility of creating a spatially explicit interactive tool able to assist stakeholders and policy makers in dealing with water related issues in different socioeconomic and climatic contexts through the analysis of what-if scenarios. Future studies could further develop the instrument by integrating updated socioeconomic, biophysical, and demographic projections. The difficulties and the limitations encountered in this process were multiple. Beside the logical constraints that every global analysis has, as the other studies in this field, this work is affected by limitations in data availability. Water events database are extremely hard and expensive to collect and to manage. Data collection is mostly conducted through the application of mining algorithms operating in the news databases available only in the most widely spoken western languages. For this reason, the available datasets are necessarily biased and incomplete. Their temporal coverage is very limited, only eleven years in our case, and the sub-national geographic characterizations of the specific water related events is, in the majority of the cases, not considered. These particular factors make very difficult to apply the existing datasets for the development of spatially explicit interactive decision making tools.
As stated above, the index presented in this paper could be applied for the Agenda 2030 monitoring activities and in particular for Target 6.5 -Water Resources Management, where the only indicator regarding hydro-political dynamics used is the 6.5.2 Proportion of transboundary basin area with an operational arrangement for water cooperation. This is an indicator capturing mainly the institutional resilience in transboundary basins, with no consideration for the other determining factors specifically analyzed in this study. Therefore, the use of the proposed index could provide a substantial contribution to move from the mere recording of facts, to the understanding of phenomena the mechanisms behind them, which are prerequisites for identification of effective sustainability policies.
As noted already in previous global analyses De Stefano et al., 2017;Yoffe et al., 2003), the results of this study should be intended to be an indicator of the areas that might require closer investigation under present and possible upcoming scenarios. We recommend to further explore the development of this analysis in regional or sub-regional contexts where more detailed data is available. Future research will be focused in specific transnational river basins in developing countries where potential water stress exacerbated by climate change and variability, rapid population growth, and unsustainable development could be further destabilizing factors for the already tumultuous political context.

Author attribution
F. Farinosi, G. Bidoglio, A. Reynaud, and C. Carmona-Moreno designed the study; F. Farinosi and G. Ceccherini developed the modeling framework; F. Farinosi processed data, coded the methodology, and performed the analysis; F. Farinosi, C. Giupponi, A. Reynaud, A. De Roo, G. Bidoglio, and C. Carmona-Moreno discussed the results; F. Farinosi with comments from the co-authors wrote the manuscript.

Conflict of interest
The authors declare no conflict of interest.

Funding
Arnaud Reynaud gratefully acknowledges the financial support of the Research Chair "Finance Durable et Investissement Responsable" and the Research Chair Amundi.

Acknowledgments
Climate projection scenarios used were from the NEX-GDDP dataset, prepared by the Climate Analytics Group and NASA Ames Research Center using the NASA Earth Exchange, and distributed by the NASA Center for Climate Simulation (NCCS). The authors would like to thank Stefano Barchiesi (JRC) and Mehmet Pinar (Edge Hill University) for the valuable comments provided during the preparation of the manuscript, Tatevik Hovhannissian and Ilenia Babetto (WFP) for proofreading the final product.

IRCC and TFDD water events databases
The Transboundary Freshwater Dispute Database (TFDD) International Water Event Database (IWED) Yoffe et al., 2003Yoffe et al., , 2004De Stefano et al., 2010b) provides information about international water basin interactions between 1948 and 2008; the International River Cooperation and Conflict database (IRCC), reports water related issues between 1997 and 2007 (Kalbhenn and Bernauer, 2012). Both databases are set up in the form of water related events at dyad-basin level. Each national territorial unit in a specific river basin is defined as a basin-country unit (BCU), each of the possible pairs of BCU's in the same basin are classified as a dyad. Water related events (or interactions) are classified in the basis of a scale assigning a score representing the intensity of the issue, and its nature (conflict/cooperation): +6 most cooperative, -6 most confrontational in the IRCC case (Kalbhenn and Bernauer, 2012); -7/+7 in the TFDD case (Yoffe et al., 2004;Yoffe and Larson, 2002). The interactions related to the same water issue involving two or more BCU's are clustered in a specific "case" (multiple events could be attributed to each water case), representing the hydro-political issue determining the interactions between the countries sharing a watershed. Although the temporal coverage (11 years) is limited, the IRCC database was preferred in this analysis for the higher number of non-neutral interactions reported.
Due to the nature of the algorithms used for the creation of the database, mining water coded events from international news datasets, the event data are characterized by an uneven geographical distribution of the observations. About 4800 of the 5881 observed events refer to the most represented international river basins, namely Danube, Nile, Zambezi, Mekong, Euphrates/Tigris, Ganges/Brahmaputra, Aral Sea, Elbe/Labe (Kalbhenn and Bernauer, 2012). In total, the IRCC dataset counts 15965 entries (5881 events and 10084 combinations of dyad countries-basin-year with no events), it presents data about 262 transboundary basins, 760 dyads countries, and 1279 combinations basin/dyads (respectively 261, 725, and 1249 for the TFDD dataset) (Kalbhenn and Bernauer, 2012). Due to data limitation, 11 dyads and 2 minor basins (total of 43 observations, 21 of which non-zero) 11 were excluded. 11 Due to data limitation, the observations including Brunei Darussalam (a total of 43 -21 with interactions in the Mekong river for joint management in the context of the ASEAN political talks, and 22 without) were excluded. After removing these observations, the IRCC panel remained with 15,922 entries (5,860 events vs 10,062 dyad-basin-year combinations with no events).

Table A1
Descriptive statistics of the data used in the analysis.        (Welling et al., 2016).

Alternative water events databases
Alternative event datasets are currently available, namely: the Water and Conflict Chronology (Gleick, 1998;Gleick et al., 1994) 12 , a collection of historical conflicts where water was the object of the dispute, a side political goal, a military tool, or object of terroristic attacks; the Water-Related Intrastate Conflict and Cooperation (WARICC) (Bernauer et al., 2012a), providing information about water related events in 35 countries in the Mediterranean, Middle-East, and Sahel areas; and the Issue Correlate of War -River Claims Data Set (ICOW-River), collecting river management issues data about 82 dyads in 36 rivers mainly in the Western Hemisphere and Middle East (Hensel et al., 2008(Hensel et al., , 2006. The mentioned alternative databases were not taken into consideration in this analysis due to respectively the bias potentially rising from the un-homogeneous data collection methodology (in the case of Water and Conflict Chronology), and the limited geographical coverage (in the cases of WARICC and ICOW-River). Moreover the WARICC dataset is a collection of domestic water tension/cooperation events, which only in part overlaps the domain of the transboundary interactions object of the analysis hereby presented. However, in this study, the WARICC dataset was still considered as a source of information about the spatial distribution of water management critical hotspots within the selected basins.

Standardized precipitation Index
The index is a measure expressed in standard deviation units of the variation of the precipitation of a specific number of months respect to the long run average (WMO, 2012). The number of months based on which the SPI could be calculated usually varies between 3 and 48 months. Shorter time scales SPI is considered a good indicator of variations of soil moisture, while on longer scales (up to 24 months), it could be associated with groundwater or reservoir levels variation (WMO, 2012). That for, a shorter SPI (3 months) is often utilized to detect meteorological droughts; a medium SPI (6 months) is usually associated with agricultural droughts; a longer SPI (12-24 months) is associated with hydrological droughts. SPI was calculated using the R package SPEI (Beguería and Vicente-Serrano, 2014).

Climate modeling outputs -projections used in the study
Climate projections data used in this study belong to the NASA Earth Exchange Global Daily Downscaled Projections (NASA NEX-GDDP) dataset downscaled (0.25 degrees) and bias corrected using the Bias-Correction Spatial Disaggregation (BCSD) methodology described in Thrasher et al. (2012). NEX-GDDP includes all the 21 GCM's built in support of the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR5). Due to computational constraints, we selected 5 out of the 21 climate models, chosen in base of the structural differences among them, as described in Knutti et al. (2013). In particular, in this study the following GCM's were chosen: the National Center for Atmospheric Research     Annex B Random Forest regression algorithm RF is a Classification and Regression Tree (CART) based tool that involves an ensemble of regression trees (Breiman, 2001). The estimated trees, each of them being a regression model fit using a subset of the input data and a portion of the independent variables, are then averaged in order to reduce the bias typical of the bagging (bootstrap aggregating) techniques (Breiman, 1994). Each of the tree is a noisy but virtually unbiased regression model: the aggregation procedure allows to reduce the variance (Hastie et al., 2009).
The RF regression algorithm procedure could be summarized in few steps: A A random subset of the training data is drawn (each tree is trained using about 63%, ∼2/3, of the initial observations); B For each of the bootstrapped subsets, a tree is grown by recursively repeating three actions: select a random subset of the independent variables (m < p, being p the total number of the independent variables); calculate the best variable/split among the m variables; generate two sub-nodes. This iteration is repeated until the minimum number of observation per node is reached. Each tree is tested against the remaining observations (about 37%) and the Out-of-the-Bag error (OOBthe mean prediction error on each training subset) is calculated (specifically, in the case of RF regression, the Mean Squared Error -MSE (Eq. 1)) C Aggregate the generated trees in an ensemble and calculation of the overall MSE (Hastie et al., 2009).
The model is tuned through the calibration of three parameters: number of trees (ntree); the number of variables per node (mtry); and the minimum number of observations for the final node, often called leaf (nodesize) (Li et al., 2016;Malekipirbazari and Aksakalli, 2015). RF error is determined by mainly two factors: correlation between the trees; and strength of the individual tree. The m number of randomly selected independent variables, by default chosen as a third of the whole number of regressors (p), is crucial for this. In fact, a larger m (mtry) value would increase the correlation between the trees, while increasing the trees strength, and vice-versa (Breiman, 2001). Increasing the number of trees (ntrees) would stabilize the model and reduce the overall error, until reaching a point in which the correlation between the trees would start to rise, consequently decreasing the overall model performance. Given its characteristics, RF is almost insensitive to tuning in the size of the final node (nodesize), and consequently to the length of the tree (Segal, 2004). RF is particularly effective in capturing complex non-linear relations; the model can handle a certain degree of multicollinearity between the dependent variables; it is almost completely immune from overfitting; it is insensitive to outliers, and does not require data pre-processing (Hastie et al., 2009). Moreover, RF is not sensitive to monotonic transformations of the independent variables; at the same there is no need to perform a feature selection: RF automatically ignores the variables that do not ensure a good split. This model was successfully applied in many fields of study in which the traditional statistical analysis is afflicted by the problem of multicollinearity and the independent variables are characterized by high covariance, as for instance: genomics (Chen and Ishwaran, 2012), remote sensing (Jing et al., 2016;Rasquinha and Sankaran, 2016;Vogels et al., 2017), public health (Loidl et al., 2016), hydrology (Li et al., 2016;Mohr et al., 2017;Núñez et al., 2016), agriculture (Jeong et al., 2016), and ecological indicators (Pourtaghi et al., 2016). To our best knowledge, the assessment presented in this paper is the first application of a RF approach to a dyadic dataset in the context of international water interactions.
Since it is performed internally while estimating the OOB error, RF does not need cross-validation. However, as in previous studies (Jeong et al., 2016;Li et al., 2016;Malekipirbazari and Aksakalli, 2015), we performed a 10 fold cross validation to test the performance of the RF model in comparison with alternative algorithms, namely: Generalized Linear Regression (GLM), Boosted Decision Trees (BDT), and Support Vector Machine (SVM). The model performance was compared calculating the coefficient of determination R 2 (Eq. 2) and the Root Mean Squared Error RMSE (Eq. 3). The final calibrated RF model was trained using the entire set of observations (N=11,801). Model performance was estimated calculating MSE (Eq. 1) and pseudo R 2 (Eq. 4), a measure of the variation explained by the model (Kvålseth, 1985;Seber and Lee, 2003).
Where MSE represents the Mean Standard Error (Eq. 1), and Var y ( ) represents the variance of the observed values. An additional important feature of RF is the possibility to quantify the relative importance of each of the explanatory variables by estimating the MSE variations when a specific independent variable is permuted (Breiman, 2001;Hastie et al., 2009). Given the random nature of the model, variable relative importance is rather volatile. Although, more stable values could be achieved if the number of trees is sufficiently high, variable importance is likely to vary within a certain range especially in case of correlated variables (Altmann et al., 2010). That for, variable ranking could virtually experience permutations every time a RF model is performed (Hastie et al., 2009;Strobl et al., 2007). In order to avoid this problem, our final tuned model was run 100 times recursively and the variable relative importance estimates were presented in a boxplot form (Fig. A3). Variable interactions and final results of the model were presented using 2-dimensional and 3-dimensional partial dependency plots (Hastie et al., 2009;Welling et al., 2016). The analytical experiment was performed using the statistical software R in combination with the packages: randomForest (Liaw and Wiener, 2002), forestfloor (Welling et al., 2016), caret (Kuhn, 2008), and varSelRF (Diaz-Uriarte, 2014).
In order to validate our choice in terms of methodological approach, we tested the RF model performance in comparison with alternative statistical approaches, one linear model and two other algorithms derived by machine learning. In the 10-fold cross validation, the RF model outperformed the alternative methods by minimizing the error (mean RMSE = 0.218) and maximizing the coefficient of determination (mean R 2 =0.679). The Generalized Linear Model (GLM) was the least performant, followed by the Boosted Decision Trees (BDT), and Support Vector Machine (SVM) (Fig. A1). In order to ensure comparability, all the statistical approaches were tuned with a multiple-steps procedure and the best performant parameters were selected for the final assessment. RF parameter calibration was performed running the model recursively minimizing the Mean Squared Error (MSE) and maximizing the variation explained (pseudo R 2 ). Increasing the number of trees rapidly stabilized (ntree∼150) the MSE below 0.05: the performance marginal gain for values above this ntree was found almost completely negligible. However, the final setting of this parameter was set to 500 to give more robustness to the model. The number of variables randomly selected per split (mtry), by default set at 6 (∼p/3), was found to be optimal at a value of 8. While the minimum number of observations for the final node (nodesize) value that minimized the error was found to be the default one (2 observations) (Fig.  A2). This confirms the theoretical literature indicating that the selected model handles comfortably fully grown trees without overfitting.
Annex C Comparison with the spatial distribution of the water interaction recorded in the WARICC database Our analysis highlights the areas of the transboundary basins where water interactions are more likely to rise. As an empirical validation of our results, we compared the spatial distribution of our index with geospatial data about water interactions. The only dataset of historical water related events available at fine (sub-dyadic) resolution is the Water-Related Intrastate Conflict and Cooperation (WARICC) dataset (Bernauer et al., 2012a), classifying water events recorded in 35 countries in the Mediterranean, the Middle East, and the Sahel in the period 1997-2009. A comparison between the spatial distribution of the water events classified in the WARICC dataset and the here presented likelihood of hydro-political interactions is presented in Fig. A6. It should be noted that our index was structured studying the transboundary water interactions, while the WARICC database is a repository of domestic conflict and cooperation events over water. The spatial distribution of the events corresponds to the highest values of our index for the majority of the areas under consideration, especially within the boundaries of the main transboundary river basins. The correspondence of the presented index and the events outside the boundaries of the main river basins, especially in the Adriatic portion of the Balkans, is less evident. However, there is the possibility that the high concentration of the WARICC reported events in the Balkan area could be linked with the civil conflicts that followed the collapse of the Socialist Federal Republic of Yugoslavia (1992, with the civil war spanning a decade between 1991 and 2001). Similar bias could affect the WARICC event data at the border between Eritrea and Ethiopia involved in a territorial dispute between 1998 and 2000.

Table A3
Normalized likelihood of hydro-political interaction, future scenarios, and main determining factors per river basin.       (continued on next page)