Streamflow stationarity in a changing world

Whether river flows remain stationary is of great concern to hydrologists, water engineers, and society in general, yet is subject to substantial debate. Here we provide the first comprehensive assessment of the long-term stationarity of annual streamflow for 11 069 catchments globally. Our observation-based evidence shows that the long-term annual streamflow remains stationary in 79% of catchments with minimal human disturbance, indicating that historical climate change alone has not led to non-stationarity in annual streamflow series in most catchments. In direct contrast, we found streamflow has remained stationary in only 38% of those catchments where substantial human interventions have occurred. These results demonstrate the scale of the human impact on the freshwater system, and highlight the ongoing need for dealing with the impacts of direct human interventions to ensure successful water management into the future.


Introduction
The assumption of stationarity is currently the foundation on which the planning and design of water resource systems is based (Bras and Rodriguez-Iturbe 1985, Chow et al 1988, Cheng et al 2014, Cheng and AghaKouchak 2015. The idea assumes that the statistics of an observed historical time series of streamflow can be used for future planning (Kendall et al 1983, Wilks 2011. Despite its long and ongoing application in water resources engineering, in the last decade there have been a number of important debates about whether the stationarity assumption is still a valid basis for water resources planning (Milly et al 2008, Villarini et al 2009, Lins and Cohn 2011, Matalas 2012, Montanari and Koutsoyiannis 2014. Several studies have argued that 'stationarity is dead' , because with ongoing climate change and intensified human activities, future streamflow is unlikely to remain stationary (Milly et al 2008, Villarini et al 2009. Alternatively, several studies have noted that the impacts of a nonstationary climate on hydrology are highly uncertain and that the long-term behavior in many hydrological systems has important time invariant characteristics (Lins and Cohn 2011, Matalas 2012, Montanari and Koutsoyiannis 2014. This critical scientific debate is still ongoing because we currently lack a rigorous global observation-based assessment of streamflow stationarity.
Here we use a large long-term observational database of near-continuous daily and/or monthly streamflow (Q) at 11 069 gauging stations (see section 2) to systematically assess the number of catchments where Q has remained stationary. The database is temporally heterogeneous because the underlying streamflow data were not collected for the purpose of global monitoring. Despite that, it includes streamflow records with lengths varying from 30 to 154 years (mean is about 57 years) and also includes data from all permanently inhabited continents although the data are concentrated in North America and Europe (supplementary figure S1 (available online at stacks.iop.org/ERL/16/064096/ mmedia)). Here we use that database to assess the stationarity in annual Q as this is directly relevant for water resources and for agricultural (irrigation) management.
Conceptually, the implication of a stationary time series is that the (population) mean, variance (and higher-order moments) remain constant over time. The central challenge is that we can only ever have a sample from the population. In practice, the most widely used approach in hydrologic (and climate) studies is to assess whether the mean has changed using well established methods for trend analysis (Kendall et al 1983, Kendall 1990, Milly et al 2005. This widely used approach is ultimately founded on the central limit theorem which demonstrates that one can get a reasonable estimate of the mean from a relatively small sample of the population. In hydrologic (and climate) studies we typically have data for several decades, and this is usually sufficient to examine whether the mean has changed or not. However, it is far too short to establish whether the variance has changed (see section 2). Consequently, in standard time series analysis it has become common practice to use the autocorrelation to identify nonstationary behavior (Sun et al 2018, Ukkola et al 2019 (also see supplementary figure S2). With this background, we follow previous work and adopt the widely used practical definition known as 'weak stationarity' (also sometimes known as covariance stationarity), where stationarity is defined by a constant mean and an autocorrelation function that is independent of time (Cox and Miller 1977, Brockwell and Davis 1991, Sun et al 2018, Ukkola et al 2019 (also see section 2). To assess changes in the mean, we use the non-parametric Mann -Kendall test (1990) to evaluate whether the trend in the annual Q series is statistically significant. To assess the autocorrelation, we first calculate the correlation coefficient at different time lags and assess the statistical significance using standard procedures. We find that assessing the autocorrelation at lag-1 year can suffice for a more detailed analysis based on multiple lags and we follow that protocol. With those definitions, a Q time series is considered stationary when both: (a) the mean remains constant (i.e. the trend is statistically insignificant at the specified 95% confidence level); and (b) the lag-1 autocorrelation coefficient is not statistically different from zero at the same 95% confidence level (Cox andMiller 1977, Brockwell andDavis 1991).

Datasets
We use a recent collection of daily and/or monthly streamflow (Q) observations and the catchment boundary data from 22 226 catchments located across the globe (Beck et al 2020). This database was compiled from eight sources including: (a) Global River Discharge Centre (Lehner 2012 . We excluded catchments with less than 30 years continuous Q record and/or having more than 5% missing data. The gaps in the remaining Q series were filled with the climatological averages of observed Q in the remaining years and the resultant daily or monthly time series were aggregated to obtain annual Q series. As a quality control check, we compared the resulting mean annual Q with mean annual precipitation (Beck et al 2020) for each catchment and excluded a further 113 catchments where mean annual Q was higher than mean annual precipitation. As a result, 11 069 catchments with at least 30 years continuous annual Q record were identified and used for the analysis. The longest record is 154 years (1865-2018; the Hankou station representing the mid-Yangtze River basin, see supplementary figure S2) while the average (±1 standard deviation) record length is 56.7 (± 21.1) years (supplementary figure S1).
Three types of human interventions on Q were considered: (a) irrigation; (b) water impoundments (dams and reservoirs); and (c) urban areas (supplementary figure S3). Irrigation areas were derived from the Global Map of Irrigation Areas-GMIA (Siebert et al 2015). The location and capacity of dams and reservoirs were determined based on the Global Reservoir and Dam (GRanD) database version-1.1 (Lehner et al 2011). Urban areas were identified using the 'artificial areas' class of the European Space Agency GlobCover version-2.3 300 m resolution land cover map (www.edenextdata.com/?q=data). A catchment is considered to be impacted by direct human interventions if the irrigation area is more than 1% of the catchment area and/or, if the total reservoir capacity is higher than 1% of mean annual streamflow of the catchment and/or, if the urban area accounts for more than 1% of the catchment area. Using the above criteria, 5172 (or ∼47%) of the 11 069 catchments are impacted by direct human interventions. For the 5172 human impacted catchments, many experienced more than one type of disturbance with 113 catchments concurrently experiencing all three types of disturbance (supplementary table S1). The most common sole disturbances accounted for 60% of all disturbed catchments and included irrigation (n = 1553, 30% of disturbed catchments) and reservoir operations (n = 1555, 30% of disturbed catchments) (supplementary table S1 and figure S4). In addition, to examine the coverages of snow and glacier on annual Q stationarity, we obtained Moderate Resolution Imaging Spectroradiometer (MODIS ) version 6 snow cover extent data from the NASA Distributed Active Archive Center (https://nsidc.org/data/MOD10A2/ versions/6), and global glacier distribution from the CLIMS (Global Land Ice Measurements from Space) Glacier Database (http://glims.colorado.edu/ glacierdata/).

Statistical methods
We used the 'weak stationarity' definition, also known as covariance stationarity, to evaluate the long-term stationarity in annual Q series for the 11 069 catchments (Cox and Miller 1977, Brockwell and Davis 1991, Sun et al 2018, Ukkola et al 2019. Accordingly, an annual Q series is considered stationary when both: (a) the mean is constant; and (b) the autocorrelation is independent of time (Cox andMiller 1977, Brockwell andDavis 1991). For changes in mean, we used the non-parametric Mann -Kendall test (1990) including Sen's slope (1968) to examine if there were a significant trend (at the 95% confidence level) in the annual Q series. For the second part of the test we calculated autocorrelation coefficient (r) at lag k is estimated using (Kendall et al 1983, Wilks 2011: where i is the timestep (year), n is the length of the annual Q series, andQ is the mean annual Q. The 95% confidence limits for r k are calculated following Sun et al 2018: ±1.96/ √ n. It should be noted that the autocorrelation can only be considered to be independent of time when it is within the specified confidence limits at all non-zero lags (Cox andMiller 1977, Brockwell andDavis 1991). For long-term time series, the addition of a low-frequency signal (e.g. long-term trend that is non-stationary) can effectively be identified using just the lag-1 autocorrelation (Sun et al 2018, Ukkola et al 2019. Consequently, a time series is considered non-stationary when the time series contains either a statistically significant trend or statistically significant a lag-1 autocorrelation coefficient; otherwise, the time series is considered stationary. Supplementary figure 2 presents examples of stationary and non-stationary annual Q time series. For completeness, supplementary figure S5 shows autocorrelation at lags 2-10 and the results are basically unchanged by considering those additional lags.

Results
Historical changes in annual Q exhibit a high spatial variability with a small average increase of 0.19 mm yr −2 across all catchments (figure 1(a) and supplementary figure S6). Only a quarter of the trends are statistically significant at the 95% confidence level. Significant increases and decreases in annual Q series are identified in 1520 and 1223 catchments, respectively, accounting for about 13.7% and 11.0% of the catchments (figures 1(a) and (g)). For the autocorrelation, significant (at the 95% confidence level) lag-1 autocorrelations are found in 32.5% of the catchments, with nearly all of those recording a positive autocorrelation (figures 1(b), (h) and supplementary figure S6). The physical interpretation of the positive autocorrelation is that a high (low) Q in 1 year is likely to be followed by another high (low) Q in the subsequent year. Significant negative lag-1 autocorrelations are only found in 11 catchments. For the remaining 67.5% of catchments, the lag-1 autocorrelation is statistically insignificant at the 95% confidence level. Catchments with significant autocorrelation in annual Q series for lags 2-10 are considerably rarer (supplementary figure S5). Combining the results of the trend and autocorrelation analyses, the streamflow in 40.3% of the catchments is found to be non-stationary (figures 2(a) and (b)). As noted previously, the underlying database was not specifically designed for long-term monitoring and includes records of varying length (supplementary figure S1) that could potentially bias the statistical results. Typically, this bias can be assessed by picking overlapping records that start and finish at the same time. However, there are many gauges without overlapping records. As a result, we compare the stationarity statistics between gauges with varying data length and did not found any evidence that the inclusion of records of different length would bias the results (supplementary figure S7).
To evaluate the impacts of human interventions on annual Q stationarity, we classified the 11 069 catchments into two distinct groups: (a) natural catchments where Q is not directly impacted by human interventions (NAT, 5897 catchments); and (b) catchments with direct human interventions (HUMAN, 5172 catchments). Note that the NAT catchments are primarily impacted by climate, while the HUMAN catchments are impacted by both climate and human intervention/s. Three major types of human interventions include: (a) irrigation; (b) reservoir operations; and (c) urban area coverage   (see section 2 and supplementary figure S3). For natural catchments, annual Q in 85.9% of them show a non-significant trend (figure 1(c)) and 82.5% of them show a non-significant lag-1 autocorrelation ( figure 1(d)), which when combined result in a stationary annual Q series in 78.6% of the 5897 catchments (figures 2(a) and (c)). The implication is that historical climate change alone has not broken the stationarity assumption in annual Q series for most (c. 79%) catchments. The conclusion is that the impact of climate change on long-term annual Q has been small relative to the background variability in annual Q in those catchments largely free of human interventions. Conversely, the proportion of non-stationary annual Q series in catchments having direct human interventions is nearly 2.5 times that of natural catchments (figure 2(a)). Significant trends (figure 1(e)) and lag-1 autocorrelations (figure 1(f)) are respectively detected in 36.9% and 49.5% of the 5172 human-impacted catchments, resulting in a non-stationary annual Q series in 61.9% of these catchments (figures 2(a) and (d)). For catchments only affected by a single human intervention type, it is generally not surprising but still encouraging to observe that the proportion of nonstationary annual Q in each sub-group monotonically increased with the overall magnitude of the intervention (figure 3). For example, with the irrigation fractional area within a catchment increasing from 1% to 5% to above 20%, the proportion of catchments with a non-stationary annual Q increased from 51% to 79% (figure 3). When reservoir capacity of a catchment increases from 1% to 10% to above 100% of the mean annual Q, the proportion of catchments with non-stationary annual Q increased from 56% to 76%. Finally, when the urban fraction of catchments increases from 1% to 5% to above 20%, that proportion of catchments with non-stationary annual Q increases from 37% to 79%. Despite a similar pattern in the response of annual Q nonstationarity to the degree of human interventions for all three intervention types, a notable physical difference is that irrigation and reservoir operations favor a (non-stationary) decrease in annual Q while increasing urbanization is associated with non-stationary increases in annual Q (figure 3).
Similar findings that human-impacted catchments show an evident higher proportion of nonstationary annual Q series than NAT catchments obtained at the global scale retains at the continental scale (figure 4). For the six major continents, the proportion of non-stationary annual Q series varies from 16.8% (Oceania) to 36.3% (Africa) in NAT catchments, and increases to 55.4% (Asia) ∼69.1% (South America) in HUMAN catchments. This result suggests that in all major continents: (a) historical natural Q remains largely stationary; and (b) direct human impacts greatly outweigh the climate change impacts on annual Q series. The largest contrast between NAT and HUMAN catchments are found in Oceania, where 59.8% of the HUMAN catchments registered a non-stationary annual Q series, which is about 3.6 times more than that of NAT catchments, followed by North America (3.1 times), Europe (2.7 times), Asia (2.6 times), South America (2.3 times) and Africa (1.8 times).

Discussion
Previous studies have argued that anthropogenic climate change would result in substantial changes in future annual Q (Milly et al 2008(Milly et al , 2015. Here, based on extensive observations, we show that retrospective climate change-induced annual Q changes remain largely stationary across the majority (c. 79%) of global catchments, suggesting that historical climate change-induced annual Q changes are well bounded within the observed natural variability (at the annual scale) in these catchments. In natural catchments, long-term changes in Q are usually controlled by changes in precipitation (Yang et al 2018), and our finding that natural Q remains generally stationary is in line with previous findings that annual precipitation has also remained largely stationary across most terrestrial environments (Sun et al 2018). An important regional exception is the central-north USA where annual Q shows widespread non-stationary increases, likely due to the substantial precipitation increase observed in that region (Sheffield and Wood 2008, Easterling et al 2017. Beside central-north USA, nonstationary annual Q increases are generally concentrated in high latitudes of the North Hemisphere, which is likely caused by enhanced snow and/or glacier melting under warming in these regions (Huss et al 2010, Berghuijs et al 2014, IPCC 2019) (supplementary figure S8). In contrast, non-stationary annual Q decreases are more found in mid-low latitudes and in semiarid regions, which is found to be primarily caused by decreased local moisture recycling and atmospheric moisture inflow (so less precipitation) in these areas under climate change (Zhou et al 2021). The contrast between Q increases in high latitudes and Q decreases in mid-low latitudes during historical periods also persists (and possibly enlarges depends on the scenario of future climate change) in projections of future climate (Milly et al 2005, Dai 2016, Yang et al 2019, implying a strengthening uneven distribution of fresh water resources between these two latitude bands in future climate scenarios. Compared to NAT catchments, non-stationary changes in annual Q occur much more frequently in catchments impacted by direct human interventions. This demonstrates that over the last century, direct human intervention in hydrologic systems has had a much larger effect on annual Q than climate change (Wang et al 2020). This result has been anticipated (Villarini et al 2009). Nevertheless, our analyses herein provide important quantitative measures on the difference between natural and human-impacted annual Q series, which emphasize the sheer scale of hydrologic modification that humans have introduced to the terrestrial water cycle to date at a global scale. With the continuing growth of global population, the hydrologic footprint of human impacts will surely increase as former NAT catchments are converted to HUMAN catchments to satisfy the expanding human water needs. In that regards, it remains to be seen whether and to what extent natural Q will remain stationary over the coming century.
It is worthwhile noting that the definition of 'weak stationarity' also varies in the literature. In some definitions, the second-order statistic (i.e. variance) is involved (Kendall et al 1983). However, assessing changes in variance would require a much larger sample size (i.e. longer time series) that is not available for streamflow observations. To formally evaluate that we used the Q data of 11 069 catchments and evaluated how many years would be needed to detect a change in Q variance. For that we adopted the Cohen's f 2 for effective size (1988) and a power of 0.8 (Tiku 1967) (meaning that there are more than 80% chance of not incurring a type-II error), we estimate that an average data length of 296 years (varying between 12 and 960 years; supplementary figure S9) is needed to assess changes in variance (at the 95% significant level) between the former and latter half of the data series. In fact, when setting the power to be 0.8, we found only 233 (out of 11 069) catchments had the required data length. Nevertheless, we did test if there were variance change in the current annual Q series by comparing annual Q variance in the first and latter half of each time series using the standard F-test. We find that 2254 catchments (or 20%) show a significant (at the 95% confidence level) change in variance, with 2089 of those also showing either a significant trend or significant lag-1 autocorrelation. This suggests that changes in annual Q variance are often associated with changes in mean or a strong autocorrelation. Therefore, whether accounting for variance change or not would not materially change our conclusion.
One limitation of the current study is the use of static human intervention information (i.e. onetime irrigation area, reservation capacity and urban area datasets) to categorize catchments into NAT and HUMAN catchments. However, human interventions on catchment Q also change with time. Yet, time series of human interventions across global catchments are currently not available. In this regard, our finding that annual Q remains largely stationary in NAT catchments while annual Q becomes non-stationary in HUMAN catchments should be viewed on a statistical basis and as a conservative estimate. With a large sample size (more than 10 000 catchments) and equally-sized NAT and HUMAN catchments being analyzed here, we expect the findings represented herein to be relatively robust. Moreover, using static human intervention information may underestimate the numbers of HUMAN catchments (those that were human-impacted but not captured by the human intervention datasets developed more recently). This means that if the dynamic human intervention datasets were available and are used to categorize catchments, the contrast between NAT and HUMAN catchments may be even larger.

Data availability statement
The data generated and/or analysed during the current study are not publicly available for legal/ethical reasons but are available from the corresponding author on reasonable request.