Hydrological recurrence as a measure for large river basin classification and process understanding

Hydrological functions of river basins are summarized as collection, storage and discharge, which can be characterized by the dynamics of hydrological variables including precipitation, evaporation, storage and runoff. The temporal patterns of each variable can be indicators of the functionality of a basin. In this paper we introduce a measure to quantify the degree of similarity in intra-annual variations at monthly scale at different years for the four main variables. We introduce this measure under the term of recurrence and define it as the degree to which a monthly hydrological variable returns to the same state in subsequent years. The degree of recurrence in runoff is important not only for the management of water resources but also for the understanding of hydrologic processes, especially in terms of how the other three variables determine the recurrence in runoff. The main objective of this paper is to propose a simple hydrologic classification framework applicable to large basins at global scale based on the combinations of recurrence in the four variables using a monthly scale time series. We evaluate it with lagged autocorrelation (AC), fast Fourier transforms (FFT) and Colwell’s indices of variables obtained from the EU-WATCH data set, which is composed of eight global hydrologic model (GHM) and land surface model (LSM) outputs. By setting a threshold to define high or low recurrence in the four variables, we classify each river basin into 16 possible classes. The overview of recurrence patterns at global scale suggested that precipitation is recurrent mainly in the humid tropics, Asian monsoon area and part of higher latitudes with an oceanic influence. Recurrence in evaporation was mainly dependent on the seasonality of energy availability, typically high in the tropics, temperate and sub-arctic regions. Recurrence in storage at higher latitudes depends on energy/water balances and snow, while that in runoff is mostly affected by the different combinations of these three variables. According to the river basin classification, 10 out of the 16 possible classes were present in the 35 largest river basins in the world. In the humid tropic region, the basins belong to a class with high recurrence in all the variables, while in the subtropical region many of the river basins have low recurrence. In the temperate region, the energy limited or water limited in summer characterizes the recurrence in storage, but runoff exhibits generally low recurrence due to the low recurrence in precipitation. In the sub-arctic and arctic regions, the amount of snow also influences the classes; more snow yields higher recurrence in storage and runoff. Our proposed framework follows a simple methodology that can aid in grouping river basins with similar characteristics of water, energy and storage cycles. The framework is applicable at different scales with different data sets to provide useful insights into the understanding of hydrologic regimes based on the classification.


Introduction
The hydrological cycle, as one of the main Earth systems is directly dependent on several periodical cycles with a variety of frequencies.Rotation of the Earth on its own axis, rotation around the Sun, rotation of the Moon around the Earth and variations on the Earth's axial tilt are the main cause for temporal variations in the land surface and atmosphere.Vari-ations at seasonal scale are the most recognized patterns in most hydrological processes playing important roles in water resource management.Other climatological changes and additional anthropogenic pressure also add to the complexity of the hydrological cycle.
Regardless the complexity, the primary function of a river basin in the hydrological cycle is simply characterized with three main functions: collection, storage and discharge (Black, 1997).The collection function describes the different paths that supplied water from precipitation follows until it reaches a storage component.This collected water is stored at different states and locations within a basin.Water storage, as the first-order state variable of river basins, represents its hydrologic condition and serves as the link between collection and discharge regulating the timing and amount of collected water to be released.The discharge function refers to the processes that release the stored water in the form of evaporation back into the atmosphere or as runoff.Among these functions, the prediction and understanding of the release as runoff has been of high importance to understand water hazards and resource management.Nevertheless, as runoff is highly dependent on the other two functions, understanding the dynamics of water collection and storage is unavoidable in order to understand hydrological processes at river basins.
The importance of storage dynamics has been highlighted with emerging new concepts in watershed hydrology.Fill and spill (Spence and Woo, 2003;Tromp van Meerveld and McDonnell, 2006;Shaw et al., 2012), connectivity (McGlynn et al., 2013) and threshold (Fu et al., 2013;Ali et al., 2013) are a few examples amongst various concepts of runoff generation mechanisms highlighting the importance of water storage and its capacity.Recent studies have demonstrated similar concepts at multiple scales based on water balance analysis (Sayama et al., 2011), combinations of soil moisture and streamflow measurements (Sidle et al., 2000) and numerical simulations (Graham et al., 2010).For larger river basins, there are only a few studies that have identified water storage dynamics at lake/wetland river systems (Spence, 2007;Spence et al., 2010).The stored water volume and its partitioning are important also because they control on residence time and source areas (Sayama and McDonnell, 2009), which ultimately influence on the sensitivity of the system to climate change (Tague and Peng, 2013).Hence, storage dynamics should be incorporated as a fundamental metric for catchment classifications and comparisons (Wagener et al., 2007;McNamara et al., 2011).Jothityangkoon and Sivapalan (2009) introduced a simple theoretical framework for classifying different hydrologic regimes based on storage dynamics on different semi-arid and temperate catchments.The framework shows temporal patterns of storage change with periodic rainfall rate and constant potential evaporation.The amount of runoff generated is assumed to be varied significantly depending on water storage being below or above the soil moisture at field capacity and saturation.Therefore, with different balances in rainfall, potential evaporation and the soil properties, other variables including evaporation, storage and runoff exhibit different temporal patterns, and these are further used for a hydrologic regime classification.The assessment further explores the effects of storminess, seasonality and interannual climate variability and their effect on their proposed regimes.Other examples of different approaches for hydrological classification include Weiskel et al. (2014) and the series of papers (Cheng et al., 2012;Coopersmith et al., 2012;Yaeger et al., 2012;Ye et al., 2012).Coopersmith et al. (2012) derived the classification using the aridity index, seasonality, precipitation peak with respect to potential evaporation and the day of peak runoff for 428 catchments in the United States.This classification was further used to categorize hydrological change by analyzing the conditions of the indicators (Coopersmith et al., 2014).Berghuijs et al. (2014) utilized the seasonal water balance and temporal interaction of variables to group catchments across the United States.
For global scale, several studies have also assessed the interaction of storage variables by using general circulation models (GCMs).Delworth and Manabe (1988) explored the relations between soil moisture and potential evaporation and how these two interacted and affected climate.Further they explored the relation of the persistence of soil wetness with the persistence of relative humidity by comparing their lagged autocorrelations (ACs) (Delworth and Manabe, 1989).Also at global scale, the interactions between runoff processes, their feedback with the atmosphere and their effects on a simulated water cycle have been thoroughly studied by Emori et al. (1996).Macroscale effects of water and energy supplies (Milly and Dunne, 2002) and their influence on river discharge have been also analyzed using observed data and GCMs (Milly and Wetherald, 2002).For river basin characterization with storage information, Masuda et al. (2001) used basin and atmosphere budgets to evaluate water storage and described similarities among storage patterns for major basins in the world.More recently Kim et al. (2009) used two indices to quantify the significance of different storage components in terrestrial water storage, namely, subsurface storage, snow and river storage, and described their behavior in 29 basins.
The objective of the study is to propose a classification framework for large river basins employing the temporal patterns in precipitation, evaporation, storage and runoff utilizing a global data set.We follow the frameworks of Masuda et al. (2001), Jothityangkoon and Sivapalan (2009) and Kim et al. (2009) in terms of analyzing the temporal variations of the four main hydrological variables in different climatologies to find similarities and dependencies in runoff generation and variable interactions.Among a variety of metrics, this study focuses on recurrence of hydrologic variables by defining it as the degree to which a monthly hydrological variable returns to the same state in subsequent years.The reason for choosing the recurrence as a metric is practi- cal.The recurrence of runoff and the other three hydrological variables are of high importance for a water management perspective.For example, Fig. 1 compares monthly runoff from two different basins with high and low recurrence characteristics.Although total runoff volume and the seasonality are obviously dominant factors for water resource management, and therefore many previous classification studies have focused on these metrics to represent them (Weingartner et al., 2013), anthropogenic systems have already adapted to the local hydrological regimes to some extent.Generally, it is more challenging for water managers to handle a random pattern with high fluctuations and different from past experiences, such as floods and droughts happening at unexpected magnitudes in unexpected seasons.The feature of our proposed classification is to show which variables are recurrent or nonrecurrent and how different combinations of the recurrence (i.e., our proposed river basin classes) are distributed in the world.
Section 2 describes the data used in this study, followed by the methodology to calculate recurrence and classification of large river basins in the world in Sect.3. Section 4 presents the results and regional characteristics of the basins.In Sect.5, we discuss the relationship between our classification and other metrics including aridity, seasonality and phasing between water and energy cycles, as well as future application of the proposed classification.

Data
This study uses the WATCH Forcing Data for the 20th Century (WFD) and the WATCH 20th Century Model Output from the Water Model Intercomparison Project (WaterMIP) data sets provided by EU-WATCH.The forcing data are based on the European Centre for Medium Range Weather Forecasting (ECMWF) reanalysis ERA-40 data (Weedon et al., 2010(Weedon et al., , 2011)).The model output data set represents contemporary naturalized conditions, with no human interaction such as reservoirs or agricultural withdrawals at 0.5 • spatial resolution (Haddeland et al., 2011).The EU-WATCH project includes land surface models (LSMs) and global hydrological models (GHMs) depending on models solving energy balance or not.
1. Precipitation: precipitation is provided as part of the WFD data set.LSMs require input rainfall and snowfall independently provided by the WFD data set, whereas GHMs use their own algorithms to separate rainfall and snowfall, using total precipitation as input.Since the partitions within the GHMs are not available in the provided EU-WATCH data set, this study used total precipitation for the classification as the aggregated variables of rainfall and snowfall.
2. Evaporation: simulated evaporation for each model is provided as total flux without the distinction of its source (transpiration from vegetation, bare soil evaporation, sublimation, etc.).
3. Runoff: simulated surface and subsurface runoff for each model are provided independently.However, since the partitions between surface and subsurface differ significantly among models total runoff is used in this study.River discharge is also provided for some models but for comparative purposes generated runoff from land surface is selected for the classification.
4. Storage: storage is defined in this study as the total amount of water held in a basin regardless its physical state or location.Table 1 summarizes different storage components aggregated to estimate the total storage.In the discussion, further analysis is conducted by using individual components to understand their influence.
The time period selected for the analysis is from 1979 to 2001 at a monthly scale.The original data including precipitation, evaporation, storage and runoff were analyzed first to test their recurrences, explained in the next section at each grid cell.Then for the world's largest 35 river basins (Fig. 2), the variables are aggregated within the basin and their recurrences calculated to classify the basins.

Quantifying recurrence
This section introduces three metrics for evaluating recurrence, which include AC, fast Fourier transform (FFT intensity) intensity and Colwell's index of contingency (Colwell, 1974).In this study, since our interest is the recurrence of monthly variable as defined above, we used a period of 12 months for each metric.The definitions are described below and their characteristics are discussed in Sect.5.2.

Lagged autocorrelation
A serial AC defined as (Eq. 1) describes the correlation of a time series with time lag k: where r k is the AC coefficient for lag k, N is the total number of observations and x is the mean.This AC calculation loses intensity as the lag increases dying down to zero as it approaches N .The AC can further be calculated in terms of the covariance but this computation is considered as a bias calculation of AC.In order to avoid the biased calculation and still be able to calculate a correlation between partial series with larger lags, this series can be assumed as a totally separate series with different mean and variance and the calculations can be computed as simple correlation with the following equation (2) For the recurrence measure with monthly time series, evaluating the AC of time lag 12 only is insufficient because it would only take into account the recurrence in contiguous years.We find it more appropriate to include the AC at other multiples of 12.Given the length of the time series used in this study, we decided to use the mean of AC from time lags 12, 24, 36, 48 and 60.The results will be dependent also on the temporal resolution (e.g., daily or yearly time series).However, in this study we decided to use a monthly resolution and look at yearly cycles because 1 year is usually a unit at which most of human activities and natural cycles repeat themselves.

Fast Fourier transforms
The other metric tested in this study is the FFT intensity which can identify important periods based on a periodogram.The periodical part of a time series can be described by the following equation where m τ is the harmonically fitted mean, µ is the population mean, A i andB i are the Fourier coefficients, p is a period (12 for monthly data) and h is the total number of harmonics (usually p/2).
The Fourier coefficients are calculated as xi sin 2π iτ p . (5) The intensity can be calculated from these parameters as The FFT intensity is important for identifying the periodicity at a particular frequency.A peak in the plot of intensity vs. frequency (periodogram) identifies a frequency for which a periodical pattern is found.For most hydrological data a peak at a frequency equivalent to a year exists (i.e., 12 months for monthly data, 52 weeks for weekly, and 365 for daily).If a series follows a pattern similar to a sinusoidal function, the intensity will be higher than a series departing from this pattern.Additionally, if a series contains much noise the intensity will also be reduced.Hence, a recurrent pattern shows higher FFT intensity.Since the FFT intensity is sensitive to the amplitude and magnitude we applied a standard normalization.Discussion on the characteristics and capability of FFT to measure recurrence is provided in Sect.5.2 Colwell (1974) introduced the indices of constancy and contingency, which together form the index called predictability.These indices have been used to analyze physical and biological temporal fluctuations.The index has been used widely in the analysis of flowering trees (Colwell, 1974), variations in river temperature (Vannote and Sweeney, 1980), variations in flow velocity (Riddell and Leggett, 1981), rainfall distribution at a yearly basis (Miller, 1984), periodicity analysis in streamflow or rainfall data (Gan et al., 1991), classification of flow regimes for environmental flow assessments (Zhang et al., 2012) and description of waterholes in hydrological regimes (Webb et al., 2012).Colwell (1974) defined predictability as the measure of the certainty of knowing a state at a given time, being composed of the sum of two components: constancy, which represent how uniform the state of a variable is at different time cycles, and contingency, which measures the degree to which state and time are dependent on each other.

Colwell's contingency index
Calculation of the Colwell's index requires first categorizing the continuous data to prepare a matrix.The columns of the matrix represent time categories and rows represent the states of a phenomenon.In this study the columns represent different months and the rows represent ranges of standard deviations, whose ranges are between ±4, which is equally divided into 16 categories with intervals of 0.5σ .Now let N ij be the number of times that a variable falls in state i at time step j .The sum of all columns for each state i is X i , the sum of all rows for each time step j is Y i and the total number is Z.Then contingency (M) of Colwell's index is defined as where s is the number of rows, H (X), H (Y ) and H (XY ) are defined as Contingency becomes 1 if a variable is at the same state at a particular time step, while the index becomes 0 if the occurrences in different time steps take place at the same state.Contingency will be higher as more occurrences in a particular time happen in a particular state.If the values of a variable in a given month are similar, they will fall under the same state interval.This will be the case of variables with high recurrence.Further discussion on the capacity of Colwell's index to represent the concept of recurrence is stated in Sect.5.2.For reference, the constancy (C) and predictability (P d ) are defined as

Hydrological classification
The variables considered in this study are precipitation P , evaporation E, runoff Q and storage S, which compose the general' hydrological cycle and are the main components of the water balance equation.At global scale or basin scale, each of the four variables are identified as being of high or low recurrence based on the description in previous sections.
The first-order division of the classification is whether runoff has high or low recurrence, followed by precipitation, evaporation and storage.As a graphical guidance we introduce a  classification tree in Fig. 3.The figure shows the 16 possible classes, and the combinations that were found and not within the basins of this study.It is provided to be used as a guidance to understand further figures.We used runoff as the first variable for the classification as it is the main concern for water resource management, and the other three variables are further used to explain why the runoff in each basin or region shows high or low recurrence.The value used for classifying the basins as high or low recurrence was an AC of 0.75.First we quantified recurrence at global scale except for Greenland, where model performance was questionable due to its particular conditions, and Antarctica, where the EU-WATCH product was not cover.This global analysis was performed for the given time series of each variable at each individual grid.The analysis for the world's largest 35 basins was performed for the time series of each variable considering the spatial average of the grids included within the limits of the basin.Among all the model output from EU-WATCH, we paid particular attention to the WaterGAP model results because it is the only model that includes a calibration module and is closest to observations (Haddeland et al., 2011).Meanwhile, all other model results are also analyzed to cover different model behaviors and discuss model uncertainty (Sect.5).

Results
In this section, we first describe the results of recurrence based on AC from the WaterGAP model as the representative case.WaterGAP is selected here as it is the only model with a simple calibration module and has better agreement with observations (Haddeland et al., 2011).Autocorrelation fits our goal as it precisely measures the degree of similarity of each year when lagged by 12 months.Section 5 discusses the differences in results for the other metrics and the rest of the different models' results.Figure 4 shows the global dis-  tribution maps of the recurrence (i.e., AC in this case) in the four variables: precipitation, evaporation, storage and runoff.From the recurrence calculated for each variable's time series, each grid was identified with red for very low recurrence (< 0.5), yellow for low recurrence (0.5-0.75) and green for high recurrence (0.75-1.0).To explain the distribution of the recurrences in the four variables, this paper uses the following terms for different latitude zones for both hemispheres: tropical (0-23.5 • ), subtropical (23.5-35 • ), temperate (35-55 • ) and sub-arctic and Arctic (55-90 • ).
The precipitation in the tropical region is basically characterized by the seasonality caused by the oscillation of the intertropical convergence zone (ITCZ), and energy supply due to the effects of the Earth's tilt fluctuation.Because of this seasonality, two bands between 5 and 23.5 • for both hemispheres show high recurrence in all variables, while they are lower in general at the equatorial band between 5 • S and 5 • N where there is no seasonality.The rest of the variables follow generally the same pattern as precipitation although the high recurrence areas of storage and runoff are comparatively smaller than that of precipitation.
The subtropical region is mainly characterized by the latitudinal desert belts.This region is characterized by low humidity and general dryness in soil conditions.In this region, precipitation events are typically sudden and intense without following certain temporal patterns.During rainfall events the other variables also behave similarly.Hence, all of the four variables tend to have low recurrence.The Southeast Asia monsoon area is an exception since its behavior is similar to the humid tropics area, therefore displaying high recurrence in all variables.
The temperate region also shows generally low recurrence in precipitation due to continental climates or oceanic climates with no dry season.Eastern Asia is the only region showing high recurrence due to the effects of the Asian monsoon.Evaporation in this region has high recurrence due to seasonality with the exception of dry areas in Europe and Asia.Storage has different geographic patterns throughout the region.Runoff follows the same regionalization as storage except for Europe with comparatively low recurrence in general.
Precipitation in the sub-arctic and arctic region shows low recurrence except for some areas in North America and eastern Siberia.Evaporation exhibits the higher recurrence in this area.The extent area of high recurrence in storage and runoff is larger in this region mainly attributed to the amount of snow.
By taking the spatial average of each variable inside the 35 largest river basins in the world, we calculated recurrence and classified them following the tree illustrated in Fig. 3. Figure 5 shows the result of the classification, which is described below according to each latitude region.Figure 6 displays graphically the results of the calculations of recurrence for each variable.The figure shows the results of the calculated recurrence from the WaterGAP model output and also shows the maximum, minimum, mean and interquartiles of Table 1.Overview of models included in this research and their characteristics.Adapted from Haddeland et al. (2011) and Gudmundsson et al. (2012a, b).Model names in bold are considered as LSMs.Precipitation input is either provided as total precipitation (P ) or as rainfall (R) and snowfall (S) separately.Storage can be handled in models as ground moisture (GM), soil moisture (SM), surface storage (SS) and snow water equivalent (SWE).Potential Evaporation (E P ) is provided (yes) or not provided (no).recurrence calculated using the other models.Table 2 summarizes the characteristics of each class.

Tropical region (0.0-23.5 • )
The tropical region has the most diversity of classes.In this region we found basins belonging to the QPES, QPS, PES, PE and E classes.Mainly, there are two distinct patterns observed in runoff.High recurrence in runoff takes place in the most humid basins exemplified in Fig. 7a by QPES and Fig. 7b by QPS.Consistent with the global analysis results, we found that precipitation is highly recurrent for these classes due to a repeating pattern resulting from the oscillation of the ITCZ.Evaporation and storage are also highly recurrent as they follow the same pattern as precipitation, as can be seen in the Amazon time series in Fig. 8a.In the Orinoco Basin evaporation is maintained rather constant as the basin is energy limited and potential evaporation is con-stant resulting in low recurrence in evaporation.Storage on the other hand follows the same pattern as precipitation resulting in a highly recurrent pattern.More than half of the basins in the tropics exhibit a low recurrence pattern in runoff.These basins are exemplified by PES and PE in Figs.7 and 8.These basins are drier, with less runoff ratio, than basins with recurrent runoff and water limited in some periods of the year.Precipitation shows high recurrence due to the availability of moisture being related to the ITCZ.In these classes evaporation follows the same pattern as precipitation, following the moisture availability pattern.Storage has high recurrence in PES basins mainly because they are characterized by peaks in precipitation and potential evaporation taking place at a different time of the year as seen on the Zambezi River's climatology in Fig. 7.As a result the storage fluctuates largely because the soil moisture component fills in the wet season and nearly dries in the dry season (Fig. 8c and storage component climatology of Zambezi Basin in the Supplement).This creates a strong seasonal pattern in total storage leading to high recurrence.The PE class is characterized by the peaks of potential evaporation and P peaking at the same time (Fig. 7d: PE).Compared to Amazon, average precipitation is much lower but potential evaporation is almost the same.The Congo Basin can be energy limited (P > E T − E P ) in the wet season; therefore, regardless of the amount of precipitation, evaporation will reach its potential creating a more recurrent pattern in evaporation.The anomalies in precipitation directly transfer to storage and runoff variations, and since runoff ratio (Q/P ) and storage change ratio ( S/P ) are much smaller, these anomalies are larger relative fluctuations to these variables; hence, recurrence in storage and runoff patterns is low.Sao Francisco Basin is an exception in this region consisting only of recurrent evaporation.This type of basin is mainly seen in the temperate region and is explained in detail in Sect.4.3.

Subtropical region (23.5-35.0 • )
In subtropical region, mainly two patterns of classes are observed.On the one hand, QPES river basins are located in Southeast Asian monsoon, where similar behaviors are observed as the same class river basins in tropical region.On the other hand, we can observe the basins that are extremely dry, represented by Orange Basin in Fig. 7.In the latter basins, all variables follow the patterns of precipitation being, sudden, abrupt and lacking any defined temporal distribution, leading to class L (i.e., none of the variables are recurrent).The Indus River basin is an exception in this region belonging to the E class.

Temperate region (35.0-55.0 • )
In the temperate region there are three particular classes observed: PE, ES and E. All of these classes have low recurrence in runoff and high recurrence in evaporation due to the seasonality in energy supply.
Basins located in eastern Asia belong to the PE class as explained previously in the tropical region section.The reasons why this class takes place in the temperate region are the same as that for the tropical region, i.e., the reason for recurrence in precipitation is coming from the moisture supply following the Asia monsoon pattern.
A dominant class in this region is the ES class exemplified by the Mississippi Basin in Fig. 7.In this type of basin the precipitation pattern is not recurrent without a distinct dry season.Storage is recurrent in these basins as a result of the energy balance characteristics.Due to the limited energy during the winter season, precipitation is directly transferred to storage increase.During summer, the basins in this class are characterized by being water limited, and therefore most of the precipitated water is evaporated allowing for storage to decrease.In these basins there is some influence of snow; however, the amount of snow is not as high enough to create a recurrent runoff pattern.
Another group in the temperate region is characterized by recurrence in evaporation only as is exemplified by the Danube River basin.In these basins, precipitation has a pattern of low recurrence that transfers to the variables of storage and runoff.As compared to the Mississippi, the Danube River basin is not energy limited during summer.This cre-ates a pattern whereby the anomalies and low recurrence of precipitation also transfer to storage thereby reducing its recurrence.ergy supply.All of the basins in this region except Kolyma have recurrent runoff.The runoff pattern is dominated by snowmelt taking place similarly year after year as observed in the sudden peak in runoff during spring (Fig. 7h-j).

Sub-arctic and
Basins belonging to the QPES and QPE classes have high recurrence in precipitation due to moisture inflow from the ocean (Figs.4a and 5).The recurrence in storage is dependent on the amount of snow.The climatologies of these basins (Fig. 7h-j) show that storage peaks during the winter months due to the accumulation of snow. Figure 9 shows the climatology of storage in these basins further subdivided into the volume of the different components.Table 3 shows the component contribution ratio (CCR) (Kim et al., 2009) describing the contribution of each storage variation to the variation of total storage.As it can be seen, in these basins the highest contribution takes place from snow.The Water-GAP model in particular has a small groundwater tank which includes only the dynamical part making it small in volume and contribution.Figures 10 and 11 show the snow water equivalent (SWE) and seasonal precipitation amounts.From these two figures, we can observe that basins with higher snow accumulation have higher recurrence both in storage and runoff.
Basins without recurrent runoff (QES and QE) are basins located on continental areas experiencing precipitation patterns with no defined dry period.From Figs. 9, 10 and 11 we can also conclude that storage is recurrent for these basins depending on the amount of snow; higher SWE and winter precipitation are linked to higher recurrence.For this region, the recurrence in storage and runoff is independent from the recurrence in precipitation but it is dependent on the precipitation and snow amounts.

Recurrence vs. seasonality
This section discusses the characteristics of recurrence measured by AC from monthly variables with the lags of 12 month multiples.First, we compare the recurrence and seasonality, following the definition of Walsh and Lawler (1981): where xn is the mean rainfall of month n and R is the annual mean of a hydrological variable.Hence, the seasonality measures the degree to which each monthly variable of a regime curve deviates from the overall annual mean.Seasonality is essentially different from the recurrence which, as defined above, measures the degree to which a monthly hydrological variable returns to the same state in subsequent years.
Figure 12 displays the relationship between recurrence and seasonality for all the time series in the study, including each variable from every basin.The figure suggests that generally higher seasonal variability tends to have higher recurrence.This is because if a variable has strong seasonality, the influence of the deviation from the climatology has comparatively less impact on the AC.Appendix A shows the distribution of seasonality and recurrence in all variables.Nevertheless, there are exceptions where variables are highly seasonal but not recurrent.For example, Fig. 13 shows the monthly average precipitation in Ob and Yenisei.The two basins are located in the same latitudinal region sharing their borders.The climatologies of the both basins are similar with comparable magnitudes at all months.However, the year to year variability in the both basins are different: Ob shows higher variations than Yenisei.Therefore, the precipitation in Ob has lower recurrence (0.65) than that in Yenisei (0.88).Similar cases can be observed when comparing the climatologies shown in Fig. 7 and the measure of recurrence presented in Fig. 6, and in previous work (e.g., Kim et al., 2009) where storage climatologies show strong seasonality but the yearly time series does not behave in a recurrent manner.
To further explain the difference between recurrence and seasonality, we use Fig. 14  recurrence.Case 3 and case 4 are precipitation of Yenisei and Ob with similar seasonality and high recurrence in Yenisei and low recurrence in Ob as discussed above.Case 5 is a sinusoidal pattern repeating the exact same values and show high seasonality and recurrence.Case 6 adds a decreasing trend to the case 5, but it keeps similar seasonality and recurrence.In summary, seasonality is calculated from the climatology of a variable which results from a long-term average, while recurrence measures the year to year variability of the monthly pattern of a variable.Recurrence is an additional feature of temporal patterns of basins providing different information than seasonality.

Recurrence vs. aridity
Recurrence in runoff and storage also has some relation with the aridity of a basin as well as the timings of energy and water availability.These basin characteristics are essential in determining the basins' functionality as they are a descriptor of how much water from precipitation is transferred to evaporation, storage change or runoff, and they have been included as classification indices in previous works such as Jothityangkoon and Sivapalan (2009), Coopersmith et al. (2012Coopersmith et al. ( , 2014) ) and Berghuijs et al. (2014).Figure 15 shows the relations between aridity, timing of peaks in precipitation (water supply) and E P (energy supply) with recurrence in runoff and precipitation by region.Figure 15a and b show that in humid basins, where the runoff ratio and the storage change ratio are high, runoff and storage follow the patterns in precipitation.Drier basins have low recurrence in runoff (classified as PES, PE, ES or E), essentially due to the high sensitivity of runoff to precipitation under smaller runoff ratios.For example, the case of Amazon and Congo, aforementioned in Sect.4.1, has a difference in recurrence of storage and runoff.For precipitation, both variables have similar relative variations but the total precipitation in Congo is about 70 % of the precipitation in Amazon.Additionally, the runoff ratio is smaller in Congo (0.4) than in Amazon (0.45).The physical meaning of this aspect is that there is less water volume in Congo transferring from precipitation into storage fluctuation and runoff generation.Hence, the same anomalies in precipitation have a larger impact in Congo than in Amazon.Furthermore, recurrence of storage and runoff also depends on the timing of P and E P peaks.As Fig. 15c and d indicate, the recurrence becomes higher if P and E P are out of phase (> 2 months).

Recurrence measured by FFT intensity and Colwell's contingency compared to AC
The proposed indices to measure recurrence are lagged AC, FFT intensity and Colwell's indices.For most of the cases, the basins that show higher AC also have higher values of FFT intensity and Colwell's predictability.However, it is to be noted that some basins showing lower AC and FFT intensity have high Colwell predictability, especially in dry conditions.For example, in the arid basins where all the variables are low most of the time except for abrupt peaks, AC and FFT intensity are low, while Colwell's constancy and predictability are high.However, these basins are rather low in Colwell's contingency (  at a particular time.For this reason Colwell's contingency results are highly consistent with the results of AC and FFT intensity.Colwell's contingency is not only consistent with the other indices but also adequate for measuring recurrence as defined above.Table 5 shows the classification of each basin using the different metrics.
Figure 16 shows the correlation between AC and FFT intensity and AC and Colwell's contingency from the Water-GAP model.All indices correlate well although there are particular cases that deviate from the regressions.As mentioned in the methodology section, the threshold selected for AC was 0.75.For FFT intensity and Colwell's contingency measures thresholds of 150 and 0.25 were selected to minimize the number of basins categorized as different classes.Table 5 shows the classification of basins from different metrics.
The FFT procedure is used to represent a time series by fitting a sine and cosine function; therefore, the FFT intensity will be higher for variables following a sinusoidal pattern.Figure 17 exemplifies the different periodogram with their respective partial time series and climatology.Figure 17a shows the example of evaporation in Changjiang for which a highly sinusoidal pattern indicates high AC and FFT intensity.Figure 17b shows an example of low recurrence with low AC and FFT intensity.However there are two examples where the FFT intensity value indicates low recurrence while AC indicates high recurrence.First, Fig. 17c (Congo evaporation) shows a bimodal pattern which has a high AC but low FFT intensity; since the peaks in evaporation appear at different frequencies, the intensity at a period of 12 months becomes weaker and other high intensities appear at different frequencies.The second example shown in Fig. 17d, takes place with basins in the sub-arctic region where the highest volume in runoff comes from snowmelt in early spring, but the peak in precipitation takes place during summer, creating a lump in the recession of the runoff climatology.This second lump reduces the intensity at a period of 12 months and increases other frequencies seen on the periodogram.For both of these cases with deviations from a sinusoidal function, AC better represents the concept of recurrence because if the same pattern repeats, independent of the shape of the pattern, AC at lag multiples of 12 will be higher.
Colwell's contingency also has high correlation with AC.However, Colwell's index is mainly used for qualitative descriptions in ecological sciences but it is adjustable to time series when variable intervals are used as states.Limitations of the use of Colwell's index for hydrological time series have been extensively discussed by Gan et al. (1991)   clude the dependence of the results on the amount of classes selected, and the tendency for higher values in contingency with shorter record lengths.These are the intrinsic limitations of Colwell's index with the discretization of data.

Result dependency on model structure
Model differences and uncertainties have been widely discussed in literature about model intercomparison (e.g., Haddeland et al., 2011).Main differences among the models are attributed to evaporation and snow modules, as well as their storage components.Here we briefly discuss how the model structural differences affect the results in the calculation of  Figure 18 shows the box plots containing the ranges of recurrence for every variable in all basins by the eight different models.
Marginal differences on recurrence are found in most of the tropical humid basins in the QPES class.Larger differences are observed in storage variables in these basins.For the case of the Brahmaputra GWAVA and the MPI-HM models are outliers in the recurrence of storage computing 0.03 and 0.55, respectively, while other models range between 0.92 and 0.96.Haddeland et al. (2011) highlighted the overestimation of evaporation in this basin by MPI-HM due to the use of the Thornthwaite evaporation scheme.This leads to higher interannual variations on storage components due to higher evaporation.In the case of GWAVA, the storage series for this basin shows a cyclic increase in storage until it is abruptly decreased to a lower volume.This pattern is only observed in the snow component of storage which is highly overestimated in GWAVA as compared to other models.The MATSIRO model has a deep groundwater tank which in general generates less seasonal variation in runoff (Haddeland et al., 2011).This has an effect on the recurrence calculation and in many basins recurrence in runoff changes from high on all models to low in MATSIRO.
Models in the temperate zone show larger differences mostly in runoff and storage recurrence.This is due to the variety of climatologies that are present in this zone and the presence of snow.Snowfall is treated differently in each GHM, with different thresholds for snowfall, and among all models there are different melting schemes.These differences mainly affect basins that are around the threshold zone between 0 and 1 • C where precipitation is partitioned between snow or rain and melting processes start (Haddeland et al., 2011).Despite these large differences, most models indicate the same class for most basins.In sub-arctic basins, where the influence of snow is much more important, the differences are low but the WaterGAP represent the lowest recurrent pattern of all models.This is possibly due to the degree day method.Temporal and spatial variations in snow content are larger in the WaterGAP model decreasing recurrence.However, the relation of storage recurrence and snow amount is kept as basins with higher snow content also exhibit higher recurrence.
Finally, arid basins have wide uncertainty due to the differences in partition between evaporation and runoff in each model.MATSIRO is an outlier in having high recurrence in evaporation.When inspecting the time series of storage for these catchments, a marked decreasing trend was found.This can be partially attributed to the deep groundwater tank that keeps water available for evaporation despite the lack of water supply through precipitation.Evaporation follows a seasonal cycle in MATSIRO with increasing recurrence.
The two models with storage subdivided in more components are WaterGAP and LPJmL featuring mainly a groundwater and a surface storage tank.The groundwater stores water infiltrated from soil moisture deeper underground and drains directly into a lake tank.This groundwater component represents a small volume only simulating a dynamical part of the groundwater that actually exists in a basin.Deep groundwater is not represented by these two models.The surface water storage component includes tanks for lakes, wetlands and river channels.These tanks receive as input direct runoff, flow from the groundwater tank and direct precipitation.The outflow from the surface water component is given by discharge onto a downstream cell.Due to the inclusion of a river channel tank, the possibility that our results are affected by the time lag in lengthy river channels exists.However, among the difference in the results shown in Figs. 6 and 18 there were no differences that could be attributable to the time lag due to the length of the river.Further analysis should be performed in order to understand the effects of the inclusion of river channel storage in the measures of recurrence.

Future application of the classification framework
By deriving the classification framework based on recurrence, we were able to discuss the interactions among the hydrologic variables affecting their temporal pattern.As one of future applications of the proposed classification, we would like to analyze the impact of projected climate change on hydrologic variables depending on the classes in a mechanistic way.A mechanistic approach to analyze hydrological changes is climate elasticity quantification of runoff (Sankarasubramanian et al., 2001;Yang and Yang, 2011;Vano et al., 2012).We believe that sensitivity studies could be further enhanced with this kind of classification highlighting dominant hydrologic processes, especially by incorporating a storage component.
The inclusion of storage, as well as to explain its temporal variations, is one of the features of this study.The approach adds to previous studies that have identified storage as an important component for runoff generation (Black, 1997;Sayama et al., 2011) and highlighted its interaction with precipitation and evaporation temporal patterns (Jothityangkoon and Sivapalan, 2009).Our classification remarks on how storage is controlled and how it controls runoff in different classes.We identified that for particular classes; the effects of precipitation and potential evaporation transfer more directly to runoff, while in other classes runoff is buffered by storage.Our framework can be utilized as a bench state of basins and analyze the shifts in classes or changes in the temporal variations due to hydrological change, similar to Coopersmith et al. (2014).For this type of study, EU-WATCH provides excellent data sets for the 20th century and projections into the 21st century to analyze the change in temporal patterns under different conditions.This paper presented a framework of hydrologic classification applicable to large-scale river basins based on monthly temporal variations of precipitation, evaporation, storage and runoff.The classification was derived from the concept of hydrological recurrence as a metric defined as the degree to which a monthly hydrological variable returns to the same state in subsequent years.The recurrence was measured using the mean of autocorrelations (AC) with the multiples of 12 to 60 month lags, the intensity of fast Fourier transforms (FFT intensity) and Colwell's contingency index.These measures were calculated at global gridded scale (0.5 • ) and at the 35 largest basins of the world based on the model forcing or output of the EU-WATCH data set.
The recurrence of individual variables is generally different in different latitudinal regions.For the recurrence in precipitation, the seasonality of moisture plays an important role, while for that in evaporation, the effect of seasonality in energy is more dominant.Storage recurrence is more dependent on the seasonality of moisture in the tropics and snow at higher latitudes.Finally, all combinations control the characteristics of the recurrence in runoff.
According to our proposed classification, which results in 16 possible classes from the combinations of high or low recurrence of the four variables, only 10 classes are present from our study of river basins.In the tropical region, essentially recurrence in runoff and storage is dependent on aridity.Humid basins are highly recurrent in all variables.Drier basins have low recurrence in runoff, but storage recurrence is dependent on the timing of the peaks in precipitation and E P .
In the temperate region, evaporation is always recurrent due to high seasonality, while precipitation shows low recurrence in this region, due to basins' aridity.In these basins, the timing of peaks between P and E P also influence the recurrence in Q and S.
In the sub-arctic region, evaporation is again highly recurrent due to extreme seasonality.Precipitation is recurrent in areas with oceanic currents influences.Recurrence in storage is in the basins with larger amounts of snow, whose melting processes dominate the patterns of runoff.As a result, the runoff recurrence is high in this region, while the storage recurrence varies in different areas.Therefore, the river basins are mainly classified into QPES, QPE, QES or QE depending on their combinations.
The above results were primarily obtained based on the analysis of AC metric with WaterGAP model output.However, the other two metrics, FFT intensity and Colwell's contingency, and other eight models also essentially showed consistent results.
Overall the presented approach is an attempt to define basin similarity accounting for the temporal patterns of water balance components.River basins in the different classes are likely to behave differently even under similar changes in climate control.The same framework may be applied to long-term time series data from different sources including GCM future projections.Furthermore, by using long-term time series broken down into partial time series, the proposed framework may identify a hydrologic regime shift from one class to another, as well as the characteristics of hydrologic sensitivity in different classes.For this kind of study, EU-WATCH provides useful data sets for projecting future hydrologic variables.
Finally, there are several limitations that are intrinsic to the classification framework.Although, some of the combinations that were not found are considered not feasible (e.g., only recurrent runoff), there are other classes that may be found if the sample of basins is further extended.The classification also considers no landscape controls in the hydrological processes, effects of land use and human interactions, as well as other important factors, that also dominate and influence the temporal variability of hydrological variables.The framework currently uses the spatial average of large river basins, leaving aside heterogeneity in climatic and geographic characteristics.Downscaling to smaller subbasins can bring insights not only on the behavior at smaller scale but also on how the different sub-basins add up to create a general pattern in the large-scale basins.Even though the presented method is not a definite and only classification framework, the analysis comparing different classes provides useful insights into the functions of large river basins in the world.
The Supplement related to this article is available online at doi:10.5194/hess-19-1919-2015-supplement.

Figure 2 .
Figure 2. Location of the basins included in the analysis with an assigned identification number.The latitude reference lines identify the latitudes that divide each of the regions geographically separating the basins.

Figure 3 .
Figure 3. Hydrological classification tree.Color codes indicate the colors used in further maps to identify the classes to which basins belong.Dashed lines indicate paths into classes that were not found in the studied basins.

Figure 4 .
Figure 4. Recurrence in main hydrological variables at global scale: (a) precipitation, (b) evaporation, (c) storage and (d) runoff.The map identifies the areas with lowest recurrence (< 0.5), low recurrence (0.5-0.75) and high recurrence (0.75 <).Reference latitude lines identify the divisions in latitudinal regions where particular conditions and similarities were found to exist.

Figure 5 .
Figure 5. Basin location map with identification by class.A threshold for defining high recurrence or low recurrence was set at 0.75.Latitude regions were defined between the reference lines shown on the map for both hemispheres delimiting the tropical region between 0.0 and 23.5 • , subtropical region between 23.5 and 35.0 • , temperate region (35.0-55.0• ) and sub-arctic and Arctic regions (55.0 <).

Figure 6 .
Figure 6.Radar charts depicting the results of recurrence for each variable in each individual basin.Results from the WaterGAP model are highlighted in red, the model mean is shown as a solid black line, the interquartile is shaded in gray, and the max.and min.values are shown with a dashed black line.

Figure 7 .
Figure 7. Variable climatologies for selected basins for each class and region.The charts present a particular basin for each of the 10 classes found sorted by region.Comparable axis of precipitation, evaporation, runoff and potential evaporation are shown on the left vertical axis and storage axis is shown on the right vertical axis.

Figure 8 .
Figure 8. Monthly time series of selected basins in the tropics from each class: (a) Amazon -QPES, (b) Orinoco -QPS, (c) Zambezi -PES, (d) Congo -PE.The graphs exemplify time series with high or low recurrence depending on the classification.The averaged AC coefficient is provided in the top right corner of each graph.

Figure 9 .Figure 10 .
Figure 9. Climatology of storage and the various storage components for sub-arctic basins.

Figure 12 .
Figure 12.Relationship between recurrence and seasonality from all of the time series corresponding to each variable in each basin.

Figure 13 .
Figure 13.Seasonal climatologies of precipitation in Yenisei and Ob river basins (a), long-term mean (b), and (c) 23-year precipitation in Yenisei and Ob river basins.(b) and (c) show the minimum, maximum quartiles and mean for each month.

Figure 14 .
Figure 14.Schematic time series representing different levels of recurrence, variability and seasonality.

Figure 15 .
Figure 15.Relation of Aridity and Timing of peaks and recurrence in runoff.(a) Timing of P and E P with recurrence in runoff, (b) relation of timing of peaks in P and E P peaks and recurrence in storage, (c) relation of aridity and recurrence in runoff., and (d) relation between aridity and recurrence in storage.

Figure 18 .
Figure 18.Model differences.Box plots show the recurrence measure for each variable in each basin displaying an interquartile uncertainty band, WaterGAP marked by the red spot, the mean highlighted by the black mark and the maximum and minimum values.
Schematic representation of different levels of recurrence in runoff (Q) time series from Mekong and Grande river basins.

Table 2 .
Summary of class characteristics.Desynchronization of the precipitation and E P cycles allows for filling of storage and also emptying during rainy and dry seasons, respectively.Runoff is only generated for extreme precipitation due to lack of saturation in storage.
L is low recurrence in all variables.

Table 3 .
Component Kim et al. (2009)o (CCR) for basins located in the sub-arctic region.The CCR is calculated as inKim et al. (2009).

Table 4
). Contingency measures the degree to which state and time are dependent on each other, measuring the degree to which a particular state takes place Hydrol.Earth Syst.Sci., 19, 1919-1942, 2015 www.hydrol-earth-syst-sci.net/19/1919/2015/

Table 4 .
Results of Colwell's indices (constancy -C, contingency -M and predictability -P d ) for all variables in arid basins.Constancy has high values due to variables being constantly low, increasing the total predictability index.
and in-