Global assessment of agreement among streamflow projections using CMIP5 model outputs

Runoff outputs from 11 atmosphere–ocean general circulation models (AOGCMs) participating in the fifth phase of Coupled Model Intercomparison Project were used to evaluate the changes in streamflow and agreement among AOGCMs at the end of 21st century. Under the highest emission scenario (Representative Concentration Pathways (RCP) 8.5), high flow is projected to increase in northern high latitudes of Eurasia and North America, Asia, and eastern Africa, while mean and low flows are both projected to decrease in Europe, Middle East, southwestern United States, and Central America. Projected changes under RCP4.5 show similar spatial distribution but with lower magnitude. The model spread of projected changes, however, is found to be large under both scenarios. Bootstrapped Mann–Whitney–Wilcoxon U test revealed that projected changes of streamflow regimes are statistically not significant in 8–32% (19–59%) of the world under RCP8.5 (RCP4.5). The model agreement on projected increase or decrease in mean and high flows is stronger under RCP8.5 than that under RCP4.5. On the other hand, the projected changes in low flow are robust in both scenarios with strong model agreement. In ∼7% (4%) of the world, high flow is projected to increase and low flow is projected to decrease, whereas in ∼29% (13%) all mean, high, and low flows are projected to increase under RCP8.5 (RCP4.5).


Introduction
The climate change driven by anthropogenic greenhouse gas emission will alter the freshwater resources (IPCC 2014), which might adversely stress its availability and use (e.g., Vorosmarty et al 2000, Arnell 2004, Gerten et al 2013, Haddeland et al 2014, and increase the risks associated with changes in runoff and streamflow (e.g., Hirabayashi et al 2008, Arnell and Lloyd-Hughes 2014. Under the climate change, average and high flows are projected to increase in Asia, Eurasia, high-latitudes of North America, and decrease in Europe and North America (e.g., Milly et al 2002, Nohara et al 2006, Dankers and Feyen 2008, 2009, Hirabayashi et al 2008, Kundzewicz et al 2010, Dankers et al 2014, Davie et al 2013. Further, low flow is also projected to increase across northern highlatitudes and Asia, and decrease in Europe and South America (e.g., Hirabayashi et al 2008, Doll and Schmied 2012, while number of hydrological drought days within a year is projected to increase in most regions of the world (Prudhomme et al 2014).
Of the studies analyzing changes in runoff/streamflow under climate change, some employ the land surface models (LSMs) or global hydrological models (GHMs) forced by bias-corrected input forcing data from atmosphere-ocean general circulation models (AOGCMs) to simulate runoff , Schewe et al 2014, while others directly use the runoff outputs from AOGCMs (Milly et al 2005, Nohara et al 2006, Hirabayashi et al 2008. In addition to uncertainties associated with the AOGCMs' projections of precipitation (Maurer and Duffy 2005, Knutti and Sedlacek 2013), the streamflow simulation using LSM/GHM might have large dependency on the hydrological model used (Haddeland et al 2011, Hagemann et al 2013, Schewe et al 2014 as well as bias correction method of input data of LSMs/GHMs (Hagemann et al 2011). The projected changes of streamflow-related variable, therefore, can not only have large difference in magnitude (as shown in Doll and Schmied 2012), but have completely opposite direction of change depending on the model(s) and data used , Hirabayashi et al 2013, contributing to low confidence in projection of extreme flows (Field et al 2012). It is, hence, desirable to evaluate changes in freshwater availability with a large number of models (Milly et al 2008) and evaluate model agreement in terms of number of models or simulations showing similar changes Gosling 2013, Hirabayashi et al 2013).
As reported in IPCC (2014) This study, therefore, analyzes the changes in extreme and average streamflow indicators (average, high, and low flows) under the new Representative Concentration Pathway (RCP) radiative forcing scenarios (Vuuren et al 2011) using the most recent outputs of 11 AOGCMs, from independent modeling institutes, participating in CMIP5. The aim is to use the maximum number of available AOGCMs for multiple scenarios to project changes in streamflow and analyze the spread and agreement between different AOGCMs, as is, without adding aforementioned sources of uncertainty.
In addition, previous studies projecting changes in streamflow evaluate the significance by comparing the magnitude of change to standard deviation of streamflow under natural climate variability Gosling 2013, Arnell et al 2013) or ignore it completely (Nijssen et al 2001, Doll andSchmied 2012). The natural climate variability, calculated using long-term simulations of AOGCMs without anthropogenic forcings, can contribute to the uncertainty in evaluating statistically significant changes. Further, if the distribution of data is unknown, as in the case of many hydrological variables under climate change, a combination of resampling method such as bootstrapping and distributionfree non-parametric statistical test which provides a powerful and robust test for hydrological changes (Yue and Pilon 2004), is recommended (Kundzewicz and Robson 2004). A bootstrap resampling approach was used by Hirabayashi et al (2013) to quantify the probability of occurrence (number of times out of 1000 bootstrap resampled data pairs) of changes in large flood. To test the statistical significance of projected changes in extreme and average streamflow, this study proposes an alternative approach using a bootstrapped non-parametric statistical test. Previous studies Robson 2004, Mudelsee 2010) suggest that bootstrap methods are advantageous to classical analytical methods of test of significance in: • Testing the change between a pair of samples with relatively small size. • Filtering out the effect of outliers (e.g., rare extreme events) in data sample. • Not requiring an assumption of distribution of data.

Data and methods
To analyze the future changes in variability of freshwater resources under climate change, the latest daily total runoff outputs of 11 AOGCMs, from independent modeling institutes, participating in the CMIP5 are used to project the changes in extreme and average streamflow. In this study, two CMIP5-AOGCM simulations were acquired: historical simulations (1850-2005) forced by natural (e.g., volcanic, solar) and anthropogenic (e.g., greenhouse gases, ozone) forcings and future simulations (2006-2100) forced by the RCP scenarios (Vuuren et al 2011). The RCP spans a range of radiative forcing from 2.6 to 8.5 W m −2 and represents various possible climate outcomes (Moss et al 2010). This study only uses the results for the most extreme RCP8.5 and moderate RCP4.5 emission scenarios. Only the models from independent modeling institutes (table 1) were selected based on the availability of daily runoff output for the historical and two selected RCP scenarios. The main text of this manuscript focuses on RCP8.5 scenario, as it shows the maximum potential changes. The projection under RCP4.5 is also briefly discussed to summarize the statistical significance of the projection under a lower radiative forcing.
To simulate river discharge, the runoff from AOGCMs is integrated horizontally along prescribed river network using a state-of-the-art global river routing model, the Catchmentbased Macro-scale Floodplain Model (CaMa-Flood; Yamazaki et al 2011). As CaMa-Flood considers the floodplain inundation dynamics based on difference of water levels, it represents temporal and spatial variations of streamflow more reasonably (Yamazaki et al 2012) compared to previous river routing models (Arora andBoer 1999, Hirabayashi et al 2008).
The daily AOGCM runoff from 1960 to 2100 was spatially interpolated from original resolutions (specified in table 1) to 15' × 15' to match with high-resolution global river Environ. Res. Lett. 9 (2014) 064017 S Koirala et al network map employed in CaMa-Flood. A bilinear interpolation was preferred to a simple re-gridding method as it provides a realistic spatial gradient rather than patches of same runoff values from an AOGCM grid in multiple CaMa-Flood grid cells. Comparison of runoff disaggregated from two methods showed that the difference in multimodel mean runoff is <10% in 67% of global grid cells. When the runoff is integrated to discharge, the difference propagates to downstream river channels but difference in magnitude of multimodel mean discharge is still within 10% in ∼70% of the world, and becomes negligible in larger river basins. The effect of interpolation of runoff and evaluation of streamflow against those from GRDC observations are discussed in online supplementary information S1 available at stacks.iop. org/ERL/9/064017/mmedia. To evaluate the impact of climate change, river discharge simulation for two 30 year time periods from 1971 to 2000 and 2071 to 2100 were selected to represent the past (20C) and future (21C) conditions, respectively. Three indicators of streamflow regimes corresponding to long-term availability of streamflow (Q m : mean annual streamflow) and extreme flow (high flow Q 5 : exceeded 5% of time, and low flow Q 95 : exceeded 95% of time within a year) were selected. For each model, the mean, high, and low flows in each year were calculated from the daily discharge simulation resulting in 30 values for 30 year period. The Q m , Q 5 , and Q 95 in 20C and 21C were then calculated by averaging the respective 30 values. In the analysis, multimodel mean is presented instead of multimodel median for direct and fair comparison with previous studies as well as to avoid potential inconsistency in choosing a different model (as median model can be different for different streamflow regimes) while analyzing the directions of changes of different streamflow regimes together (section 4).
For testing statistical significance, the bootstrapped Mann-Whitney-Wilcoxon U (MW-U) test (Mann andWhitney 1947, Wilcoxon 1950) was applied to each AOGCM. The selected MW-U test is non-parametric rank-based statistical test, which does not require any assumption of statistical distribution of data. For each AOGCM, the MW-U test statistic was first calculated using 30 original data values from 20C and 21C. 1000 bootstrap samples of data pair (one each for 20C and 21C) were then generated using a random number generator. As overlaps are allowed in bootstrapping, all 1000 samples contain 30 random data values out of 30 original data values. The MW-U test statistic was then calculated for each bootstrap sample. Then, rank of MW-U value from original data was located within the 1000 MW-U values from bootstrap samples. From this rank, the non-exceedance probability of the original data was calculated. Under a 5% level of significance, if the nonexceedance probability is <0.025, there is a significant increase in future, and if it is >0.975, there is a significant decrease. The bootstrapping procedure and calculation of the test statistic is presented in detail in online supplementary information S3.
Once the statistical significance of change in streamflow regimes was tested for each AOGCM, the agreement among the AOGCMs (multimodel consistency) was calculated as the number of models out of 11 models showing statistically significant increase or decrease or no significant change. For the simplicity of presentation and discussion, the degree of consistency (DOC) is defined as 'strong' when 10 or more AOGCMs show significant change of same direction (sign), 'moderate' when its 8 or 9 AOGCMs, 'weak' when its 6 or 7 AOGCMs, and 'insignificant' when 6 or more AOGCMs show that the projected change is not significant.

Projection of changes in streamflow
The relative change of multimodel mean of the 30 year averages of 20th century (1971-2000, 20C) and 21st century   (2071-2100, 21C) indicators of streamflow regimes under the RCP8.5 scenario is presented in figure 1. In addition, the coefficient of variation of changes (CoV, ratio of standard deviation to mean of changes projected by each model) projected by 11 AOGCMs is presented in figure 2 to analyze the spread of projected changes. The mean streamflow (Q m ) increases in 68.9% and decreases in 31.1% of global land grid cells excluding those in Greenland and Antarctica. The increase is widespread in northern high latitudes of North America and Eurasia, Asia, Africa and Australia, and parts of eastern South America ( figure 1(a)). On the other hand, a decrease is projected in most regions of Europe, Middle East, Central Asia, northern and southern Africa, southwestern United States, and Central and South America. But, the magnitude of change is different in different regions. In 5.5% of the world, the increase in 21C is >50% of 20C, and 0.2% has decrease of same magnitude. The CoV of change in Q m projected by different models is presented in figure 2(a). The CoV is >1 (i.e., standard deviation greater than mean) in 67.8% of the world (40.8% with increase and 27% with decrease in Q m ). In the highlatitude of northeast Eurasia and North America with large increase in Q m , the spread among models is relatively small with CoV < 1. In Europe and southern United States with decrease in Q m , the CoV is <1 as well. Understandably, the CoV is relatively large for regions with small change in multimodel mean Q m .
The spatial patterns of the projected change in multimodel mean high flow (Q 5 ) and that of Q m are similar in Eurasia, Asia, and Africa ( figure 1(b)). The Q 5 is projected to increase in 59.5% and decrease in 40.5% of the world. In the high latitudes of northwestern Eurasia and North America, where the mean flow is projected to increase, high flow is projected to decrease. 5.9% of the world shows increase >50%, and 0.5% shows the decrease of similar magnitude. In terms of model spread, 82.4% (45.8% with increase and 36.6% with decrease in Q 5 ) of the world has CoV > 1 (figure 2(b)), which is larger than the same for mean flow.
Finally, the spatial distribution of changes in low flow (Q 95 ) is presented in figure 1(c). The Q 95 increases in 65.4% and decreases in 34.6% of the world. In contrast to changes in Q m and Q 5 , Q 95 decreases in most regions of South America, northeastern India, wider regions of southern Africa, and China. Also, the magnitude of relative change is higher than those of Q m and Q 5 . 27.8% (1.9%) of the world shows increase (decrease) of >50%. The model spread of projected change in Q 95 is much larger with CoV > 1 in 94% of the world ( figure 2(c)).
Under the RCP4.5 scenario (online supplementary information S2), the spatial distributions of the changes are similar to those under RCP8.5. The magnitudes of relative changes of Q 5 and Q 95 , on the other hand, are much smaller than those under RCP8.5. The magnitude of relative change in Q 95 is similar under both scenarios. The spread among model projections under RCP4.5 is similar to or even slightly larger than RCP8.5. The standard deviation is larger than mean of projected changes in Q m , Q 5 , Q 95 in 73.1%, 87.6%, and 95.6% of the world, respectively.

Statistical significance and agreement among model projections
In this study, data from 30 year period is used to represent the condition of streamflow in 20th  and 21st (2071-2100) century. The occurrence of outliers within the 30 year period can easily affect the average of 30 year data, especially if the magnitude of change is relatively small. To analyze the uncertainties associated with the projections and test the statistical significance of projected changes, a bootstrapped MW-U test is applied to each AOGCM and the agreement among models in showing the statistically significant changes is calculated. To address the uncertainty caused by selection of short period of data, 1000 bootstrap samples for each model were used for the test of significance. The agreement among model projections is discussed in this section, while the results of significance test for each AOGCM under both RCP8.5 and RCP4.5 scenarios are presented in online supplementary information S3.
The DOC (see section 2) of significant changes in Q m , Q 5 , and Q 95 under the RCP8.5 scenario is presented in figure 3. The DOC for increase in Q m is moderate to strong in 31% of the world including regions of Eurasia, South and Southeast Asia, high latitude of North America, and Eastern Africa. Similar moderate to strong DOC for decrease in Q m is projected in 6% of the world in Iberian Peninsula, Middle East, southern South America, and southern United States ( figure 3(a)). In most regions of Brazil, North America, Europe, southern Africa, eastern China, the projected changes by 6 or more AOGCMs are not statistically significant. The increase in Q 5 has moderate to strong DOC in 18% of the world. Mostly, the spatial distribution of increase is similar to that of Q m except smaller area in high latitudes of North America, Alaska, and Europe. Similarly, 5% of the world has moderate to strong DOC for decrease in Q 5 , while the projected changes are statistically not significant in 32% of the world.
Further, the increase in Q 95 has moderate to strong DOC in 36% of the world, mainly in the northern high latitudes of North America and Eurasia, and China. 11% of the world has moderate to strong DOC for decrease in Q 95 , which is relatively larger than those for decrease in Q m and Q 5 . Projection in most areas in southern United States, Central and South America, Europe, and southern Africa has such strong DOCs for decrease in Q 95 . Weak DOC for decrease in Q 95 can be seen in eastern China and Australia, where the changes in Q 5 and Q m were mostly not significant. The projected changes in Q 95 are statistically not significant in only 8% of the world.
Under the RCP4.5 scenario (online supplementary information S4), the projected changes of Q m , Q 5 , and Q 95 are statistically not significant in 48%, 59%, and 19% of the world, respectively. The spatial patterns of DOC for projected changes in Q m and Q 5 are similar to those in RCP8.5, but many models agree on the changes not being significant under RCP4.5 scenario. As such, regions with strong DOC under RCP8.5 either have moderate DOC or more than 6 models show not significant change under RCP4.5. The projected change in Q 95 , on the other hand, has moderate to strong DOC for increase and decrease in 26% and 8% of the world, respectively.
In some regions, different directions of changes are projected for different indicators of streamflow regimes (figure 1). The directions of changes of low flow (Q 95 ) and high flow (Q 5 ) under the RCP8.5 scenario are compared in the figure 4. In 42% of the world, the Q 5 and Q 95 are projected to have the same direction of change (29% increase and 13% decrease). In 7% of the world, Q 5 is projected to increase, while Q 95 is projected to decrease suggesting potential increase of both riparian flood and drought in future. In 14% area, Q 5 shows decrease and Q 95 shows increase suggesting reduction in extremity of streamflow. In 37% of the world, projected changes of either one (34%) or both (3%) Q 5 and Q 95 are statistically not significant. Projected changes of mean annual streamflow (Q m ) and high flow (Q 5 ) have same direction in 51% of the world (35% increase and 16% decrease).
Under the RCP4.5 scenario, when the directions of changes in indicators of streamflow regimes are analyzed together, projection of changes in at least one of the variables is statistically not significant in 66% of the world. In 4% of the world, Q 5 is projected to increase, while Q 95 is projected to decrease.

Discussions
The spatial patterns of projected change in the selected streamflow regimes, in general, correspond to the spatial pattern of change in mean (in case of mean flow and low flow) and extreme (in case of high flow) precipitation projected by CMIP5 models (Kharin et al 2013, Knutti and Sedlacek 2013). The largest increase in high and mean flow are projected in northern high latitudes, while the largest decrease in low flow is projected in Europe, South America and Middle East. The spatial distributions of projected changes correspond to the changes in large flood (Dankers et al 2014, Hirabayashi et al 2013 and average flow (Schewe et al 2014) using CMIP5 models. The spread among AOGCMs is large in regions with small change in multimodel mean, with larger variability in projected low flow than mean and high flows. Even though the spatial distributions of the projected changes under the RCP8.5 and RCP4.5 scenarios are similar, the magnitude of change is much smaller under RCP4.5. On the other hand, the relative spread among AOGCMs is similarly large under both scenarios.
The projected changes in selected indicators of streamflow regimes are statistically not significant in 8-32% (19-59%) of the world under the RCP8.5 (RCP4.5) scenario. Compared to studies defining significant change as a change greater than the standard deviation due to internal climatic variability (e.g., Arnell and Gosling 2013), the percentage of global area, where changes are not significant, is relatively larger. The difference is mainly in the regions with relatively lower magnitude of projected change. The bootstrap resampling method, which filters out effect of outliers within data, along with relatively strict 5% level of significance might have resulted in the projected changes to be not significant in larger areas than previous studies. The method used in this study, however, does not separate the effect of internal climate variability. As both methods have their merits, it might be advantageous to consider the internal variability of climate alongside bootstrap method.
The agreement among AOGCMs for projection of high flow is relatively stronger in the northern high-latitudes and  weaker in the tropical and subtropical region. On the other hand, projected decrease in high flow has moderate to strong consistency in ∼5% of the world; mainly corresponding to regions where decrease in precipitation is robust (Knutti and Sedlacek 2013). Agreement in projections of decrease in low flow is stronger than that of high flow because the projected changes in extreme precipitation have larger uncertainty especially in the tropical regions (Kharin et al 2013). In parts of South America, where the consistencies among AOGCMs are relatively weak for projected changes in mean and high flows, the consistency for projected decrease in low flow is relatively strong. The agreement among models in projecting increase in mean and high flows is relatively strong in monsoon regions of eastern Africa, South Asia and Southeast Asia. In summary, as concluded in both Fourth and Fifth Assessment Reports of Climate Change (IPCC 2007(IPCC , 2013, the projections are more robust for increase in runoff/ streamflow in northern high latitude of North America and Eurasia, and decrease in Europe and southwestern United States. The change in long-term mean streamflow is driven by the change in high flow with same direction of projected change in 51% of the world. In large parts of South America and Africa (7% of the world), the high flow is projected to increase and low flow is projected to decrease. Even though these regions correspond to regions where population is also projected to increase, assessment of risk to human population and properties associated with the projected changes in streamflow, are recommended to incorporate the subgrid variability of changes (Hirabayashi et al 2013), and multitude of socioeconomic indicators (Field et al 2012, Arnell 2013, Ward et al 2013).
In this study, the original AOGCM data were used without correction for potential biases, as the objective was to evaluate the spread of projected changes and agreement among AOGCMs. Despite the biases in runoff, its effect on direction of change in streamflow should be minimal. An extra analysis (not shown here), using the runoff corrected by GSWP-2 multimodel mean runoff (Dirmeyer et al 2006), produced similar spatial pattern of projected changes and model agreement in most regions globally. Further, the projections of each AOGCM were assumed to be equally plausible and performance metrics (as in Gleckler et al 2008) were not used to filter out models with weak performance.
This study used a state-of-the-art river routing model, the CaMa-Flood, to integrate runoff to streamflow. Even though the CaMa-Flood represents the inundation dynamics in a realistic way (Yamazaki et al 2012, Hirabayashi et al 2013, it does not consider anthropogenic regulation of rivers. Therefore, the projections presented here correspond to potential changes in streamflow under natural condition. Under anthropogenic water use, the relative change in streamflow can be expected to be higher compared to that under natural condition (as in Doll and Schmied 2012). Also, the AOGCM runoff data were spatially interpolated from original spatial resolution to match the CaMa-Flood resolution. A comparison of bilinear interpolation with a simple re-gridding method showed that the difference in mean flow is <10% in ∼70% of the world.

Conclusions
The changes in streamflow under RCP8.5 and RCP4.5 emission scenarios were projected using the latest daily total runoff outputs of 11 independent AOGCMs participating in CMIP5. In general, at the end of 21st century, long-term mean, high, and low streamflow are all projected to increase in northern North America, northern Eurasia, Asia, eastern and central Africa and Australia, while they are all projected to decrease in regions of Europe, Middle East, Central Asia, northern and southern Africa, southwestern United States and Central America.
The spatial distribution of projected changes in mean, high, and low flows are similar under RCP4.5 and RCP8.5 scenarios. The magnitudes of relative change in mean and high flows are much lower under RCP4.5, while the magnitudes of change in low flow are similar under both scenarios. Further, the spread among AOGCMs is also large and similar for projections of mean, high, and low flows under both scenarios, which suggests that the model spread, at least for streamflow, is relatively less sensitive to the level of radiative forcing.
The statistical significance of the projected changes was evaluated using a bootstrapped MW-U test rather than commonly used method based on comparing changes with natural climate variability. Depending upon the AOGCM and scenario, the projected changes were found to be statistically not significant in slightly larger areas than reported previously. When the results of all AOGCMs were combined to express the agreement among models, the projected changes from 6 or more AOGCMs were statistically not significant in 8-32% (19-59%) of the world under the RCP8.5 (RCP4.5) scenario, which highlights the need for test of significance to improve confidence in projections.
Under the RCP8.5 scenario, AOGCMs have large agreement (strong model consistency) in the regions with projected increase (northeastern Eurasia, northern North America, and eastern Africa) as well as projected decrease (Europe, Middle East, and southwestern United States) in mean and high flow. The agreement in projections of mean and high flows is weaker under RCP4.5 as projected changes by most AOGCMs are statistically not significant. On the other hand, under both RCPs, the AOGCMs have relatively strong agreement on the projected changes in low flow in larger regions.
Due to correlated nature of changes (with direction of change) in different streamflow regimes and difference in their potential impact on human and society, water resources assessment under climate change should be based on a comprehensive analysis of streamflow variability rather than a single aspect of streamflow regime.