Hydrological Hysteresis and Its Value for Assessing Process Consistency in Catchment Conceptual Models

While most hydrological models reproduce the general flow dynamics, they frequently fail to adequately mimic system-internal processes. In particular, the relationship between storage and discharge, which often follows annual hysteretic patterns in shallow hard-rock aquifers, is rarely considered in modelling studies. One main reason is that catchment storage is difficult to measure, and another one is that objective functions are usually based on individual variables time series (e.g. the discharge). This reduces the ability of classical procedures to assess the relevance of the conceptual hypotheses associated with models. We analysed the annual hysteric patterns observed between stream flow and water storage both in the saturated and unsaturated zones of the hillslope and the riparian zone of a headwater catchment in French Brittany (Environmental Research Observatory ERO AgrHys (ORE AgrHys)). The saturated-zone storage was estimated using distributed shallow groundwater levels and the unsaturated-zone storage using several moisture profiles. All hysteretic loops were characterized by a hysteresis index. Four conceptual models, previously calibrated and evaluated for the same catchment, were assessed with respect to their ability to reproduce the hysteretic patterns. The observed relationship between stream flow and saturated , and unsaturated storages led us to identify four hydrological periods and emphasized a clearly distinct behaviour between riparian and hillslope groundwaters. Although all the tested models were able to produce an annual hystere-sis loop between discharge and both saturated and unsatu-rated storage, the integration of a riparian component led to overall improved hysteretic signatures, even if some misrep-resentation remained. Such a system-like approach is likely to improve model selection.


Introduction
Rainfall-runoff models are tools that mimic the low-pass filter properties of catchments.Specifically, they aim at reproducing observed stream flow time series by routing time series of meteorological drivers through a sequence of mathematically formalized processes that allow a temporal dispersion of the input signals in a way that is consistent with the modeller's conception of how the system functions.The core of most models, in particular in temperate, humid climates dominated by some type of subsurface flow, is a series of storage-discharge functions that, in the most general terms, express system output (i.e.discharge and evaporation) as a function of the system state (i.e.storage), thereby generating a signal that is attenuated and lagged with respect to the input signal (i.e.precipitation).
However, modelling efforts on the catchment scale typically face the problem that, on that scale, neither integrated internal fluxes nor the integrated storage and the partitioning between different storage components at a given time can be easily observed within limited uncertainty.Indeed, indicators of catchment storage such as groundwater levels and soil water content can be highly variable in space and exhibit heterogeneous spatio-temporal dynamics.While spatial aggregation of storage estimates (e.g.catchment averages) in lumped models may lead to a loss of crucial information Published by Copernicus Publications on behalf of the European Geosciences Union.and thus to overly simplistic representations of reality, allowing for the explicit incorporation of spatial storage heterogeneity in (semi-)distributed models may prove elusive in the presence of data error and the frequent absence of detailed spatial knowledge of the properties of the flow domain.A time series of groundwater table levels from a single piezometer is not representative of the behaviour of the groundwater, even on the hillslope scale; therefore, it is difficult to link it with either a reservoir volume simulated by a lumped model or an average water table level of a grid point simulated by a fully distributed model.These problems were recently addressed in some studies that intended to assess catchment storage using all available data (Mc-Namara et al., 2011;Tetzlaff et al., 2011) and showing the importance of this storage in thresholds observed in the response of discharge to precipitation in catchments.For example, Spence (2010) argued that the observed nonlinear relationships between stream flow and catchment storage (i.e.no unique storage-discharge relations) are the manifestation of thresholds occurring in catchment runoff generation.Thus, depending on the structure of the system, storage-discharge dynamics can exhibit hysteretic patterns, i.e. the system response depends on the history and the memory of the system (e.g.Everett and Whitton, 1952;Ali et al., 2011;Gabrielli et al., 2012;Haught and van Meerveld, 2011).Andermann et al. (2012) found a hysteretic relationship between precipitation and discharge in both glaciated and unglaciated catchments in the Himalaya Mountains that was shown to be due to groundwater storage rather than to snow or glacier melt.Hrachowitz et al. (2013a), demonstrating the presence of hysteresis in the distribution of water ages, highlighted the importance of an adequate characterization of all systemrelevant internal states at a given time to predict the system response within limited uncertainty as flow can be generated from different system components depending on the wetness state of the system.
In catchment-scale rainfall-runoff models, the need for calibration remains inevitable (Beven, 2001) due to the presence of data errors (e.g.Beven, 2013) and to the typically oversimplified process representations (e.g.Gupta et al., 2012).In spite of their comparatively high degrees of freedom, such models are frequently evaluated only against one single observed output variable, e.g.stream flow.Although the calibrated models may then adequately reproduce the output variable, model equifinality (e.g.Savenije, 2001) will lead to many apparently feasible solutions that do not sufficiently well reproduce system-internal dynamics as they are mere artefacts of the mathematical optimization process rather than suitable representations of reality (Gharari et al., 2013;Hrachowitz et al., 2013b;Andréassian et al., 2012;Beven, 2006;Kirchner, 2006).The realisation that there is a need for multivariable and multiobjective model evaluation strategies to identify and discard solutions that do not satisfy all evaluation criteria applied is therefore gaining ground (e.g.Freer et al., 1996;Gupta et al., 1998, 2008, Gascuel-Odoux et al., 2010) as this will eventually lead to models that are not only capable of reproducing the observed output variables (e.g.stream flow) but that also represent the systeminternal dynamics in a more realistic way (Euser et al., 2013).The value of such multivariable and/or multiobjective evaluation strategies has been demonstrated in the past, for example using groundwater levels (e.g.Fenicia et al., 2008;Molénat et al., 2005, Giustolisi andSimeone, 2006;Freer et al., 2004;Seibert, 2000;Lamb et al., 1998), soil moisture (Kampf and Burges, 2007;Parajka et al., 2006), saturated-area extension (Franks et al., 1998), snow cover patterns (e.g.Nester et al., 2012), remotely sensed evaporation, (e.g.Mohamed et al., 2006;Winsemius et al., 2008), stream flow at subcatchment outlets (e.g.Moussa et al., 2007) and even water quality data such as, e.g., chloride concentrations (Hrachowitz et al., 2011), atmospheric tracers (Molénat et al., 2013) or nitrates and sulfate concentrations (Hartmann et al., 2013a) and water isotopes such as δ 18 O (Hartmann et al., 2013b).However, most studies using multiple response variables only evaluate them individually to identify Pareto-optimal solutions.This practice may result in the loss of critical information, such as the timing between the multiple variables.In other words it is conceivable that model calibration leads to Pareto-optimal solutions with adequate model performance for all variables while at the same time misrepresenting the dynamics between these variables.Instead, using a synthetic catchment property (Sivapalan et al., 2005) or a hydrological signature (Wagener and Montanari, 2011;Yadav et al., 2007), combining different variables into one function, may potentially serve as a instructive diagnostic tool, a calibration objective or even as a metric for catchment classification (Wagener, 2007).
Hysteretic patterns between hydrological variables are potentially good candidates to build such tools.The objective of this paper is to explore (i) the potential of using annual hysteric patterns observed between stream flow and water storage both in the saturated and unsaturated zones of the hillslope and of the riparian zone for characterizing the hydrological functioning of a small headwater catchment in French Brittany (Environmental Research Observatory ERO AgrHys (ORE AgrHys)), (ii) to which degree a suite of conceptual rainfall-runoff models with increasing complexity, which were calibrated and evaluated for this catchment in previous work using a flexible modelling framework (Hrachowitz et al., 2014), can reproduce the observed storagedischarge hysteresis and (iii) whether the use of the storagedischarge hysteresis can provide additional information for model diagnostics compared to traditional model evaluation metrics. 2 Materials and methods
The climate is oceanic, with a mean annual temperature of 11.9 • C with a minimum of 5.9 • C in winter and a maximum of 17.9 • C in summer.Mean annual rainfall over the period 1992-2012 is 1113 mm (±20 %) and mean annual Penman potential evapotranspiration (PET) is 700 mm (±4 %).Mean annual drainage is 360 mm (±60 %) at the outlet.There is a high water deficit in the annual budget almost every year due to underflows below the outlet (Ruiz et al., 2002).The catchment lies under granite (leucogranodiorite of Plomelin), the upper part of which is weathered from 1 to more than 20 m deep.Soils are mainly sandy loam with an upper horizon rich in organic matter; depths are between 40 and 90 cm.Soils are well drained except in the bottomlands, which represent 7 % of the total area.Agriculture dominates the land use, with 86 % of the total area covered by grassland, maize and wheat, none of them irrigated.The base flow index is about 80 to 90 %; thus, the hillslope aquifer is the main contributor to stream flow (Molénat et al., 2008;Ruiz et al., 2002).Both stream flow and shallow groundwater tables exhibit a strong annual seasonality in this catchment (Figs. 2 and 3a).

Data
Meteorological data were recorded in an automatic weather station (CIMEL, Fig. 1) which provides hourly rainfall and variables required to estimate daily Penman PET (net solar radiation, air and soil temperatures, wind speed and direction).Discharge was calculated from water level measurements at the outlet (Fig. 1) using a V-notch weir equipped with a shaft encoder with integrated data logger (OTT Thalimedes) and recorded every 10 min since 2000 (E3).Groundwater levels have been monitored every 15 min since 2001 in three piezometers -F1b, F4, and F5b (Fig. 1) -using vented pressure probe sensors (OTT Orpheus Mini).Moisture in the unsaturated zone has been recorded every 30 min since July 2010 at seven depth (25, 55, 85, 125, 165, 215, and 265 cm) and at two locations (sB1 and sB2; Fig. 1), using capacitive probes which provide volumic humidity based on frequency domain reflectometry (Environ-Scan SenteK).Due to technical problems, data are missing in December 2012 and January 2013, so only 2 complete water years were available (2010-2011 and 2011-2012).In summary, stream discharge water table levels were considered for the years 2002-2012, and soil moisture was considered for the years 2010-2012.

Catchment storage estimates
In order to obtain a proxy for the saturated-zone storage on the catchment scale, the time series of groundwater level were normalized between their minimal and maximal values over the 10 years of records so that the normalized value lies between 0 and 1.The resulting normalized variable exhibited very similar dynamics among all the piezometers (see Fig. 2a).However, the piezometer located in the riparian zone (F1b) exhibited variations at a higher frequency, especially during the winter.Therefore, in the following, we used the average of the normalized level in the two hillslope piezometers (F5b, F4) as a proxy for the hillslope groundwater storage dynamics and the normalized level in the riparian piezometer as a proxy for the riparian groundwater storage dynamics.
In order to obtain a proxy for the unsaturated-zone storage, moisture time series were also normalized using the minimal and maximal values observed in all the sensors of the two profiles over the 2 water years with complete records, setting the minimal value as 0 and the maximal value as 1.As the normalized unsaturated storage variables obtained followed very similar trends and dynamics, we used, in the following, an average of the normalized unsaturated-zone storage among all the measurement points (depths and profiles) (Fig. 2b).The two profiles are located on the upslope and downslope parts of the hillslope.Thus, we assumed that averaging their normalized values will allow us to build a proxy for the dynamics of the unsaturated-zone storage on the whole hillslope.

Hysteresis indexes
Studies on hysteretic relationships in catchments generally focus on qualitative descriptions of patterns associated with a cross-correlation analysis between the two variables (Frei et al., 2010;Hopmans and Bren, 2007;Jung et al., 2004;Salant et al., 2008;Schwientek et al., 2013;Spence et al., 2010;  and F5b) and in the riparian zone (F1b) and (b) average, maximum and minimum unsaturated-zone storages for all the sensors in the two profiles in the Kerrien catchment.Velleux et al., 2008).Some authors proposed a typology of hysteretic loops based on their rotational direction, curvature and trend to identify solute controls during storm events (Butturini et al., 2008;Evans and Davies, 1998).For storagedischarge hysteresis on the annual scale, this approach is not sufficient as the same type of hysteretic loop is likely to happen for almost all the years when a strong seasonality exists and its pattern is repeated across years.This is the case in our study, where seasonality of groundwater level and discharge showed a strong unimodal pattern for all years, except 2011-2012, which was bimodal (Figs. 2 and 3a).Moreover, a preliminary cross-correlation analysis revealed that storage and stream flow are strongly correlated, and the cross-correlation value is the greatest for a lag time of 0 days (results not shown).
Quantitative descriptions of the hysteretic loop are also found in the literature, and various ways of computing hysteresis indexes (HIs) have been proposed, for example using the relative difference between extreme concentration values (Butturini et al., 2008) or using the ratio of turbidity values in rising and falling limbs of the storm hydrograph at the midpoint discharge value (Lawler et al., 2006).The latter authors argue that computing HIs by using midpoint discharge usually allows avoiding the small convolutions which are frequently observed at both ends of the hysteretic loop.
In this paper, as the hydrological variables exhibit a strong annual unimodal cycle, we calculated the hysteresis index each year as the difference between water storages at the dates of midpoint discharge in the two phases of the hydrological year -during the recharge period (R) and the recession period (r), i.e. respectively before and after reaching the maximal discharge Q max -as follows: where S(t) is the storage value at time t and Q(t) the stream flow value at time t.The midpoint discharge Q mid is defined as the mean value of discharge between Q 0 , the initial value at the beginning of the hydrological year (October), and Q max , the maximal value reached during that year: In order to reduce the impact of the quick variations of discharge or groundwater level due to individual storm events, we smoothed the time series using 7-day moving averages.
The strong seasonal discharge cycle led us to identify two occurrences of Q mid per year only -during the recharge period (t R ) and during the recession period (t r ) -while high and low stream flow values are taken several times per year as explained by Lawler et al. (2006).Computing the HI using the difference in storage was possible here because storage and stream flow values vary among years within a narrow range of magnitude, while Lawler et al. (2006) used the ratio because turbidity can differ by several orders of magnitude from one storm to the other.Computing the HI with the difference between the values of storage and not with their ratio allowed maintaining its sensitivity to the year-to-year variations of the width of the hysteretic loop.The difference in water storage dynamics in the unsaturated and saturated zones were approximated by the difference in normalized soil moisture content and by the difference in normalized groundwater level respectively.The HI gives two types of information: (i) its sign indicates the direction of the loop (anticlockwise loop induces a negative value of the HI, whereas a clockwise loop leads to a positive value of the HI) and (ii) its absolute value is proportional to the magnitude of the hysteresis (i.e. the width of the hysteretic loop).The HI is a proxy for the importance of lag time response between variations in catchment storages (unsaturated and saturated) and stream discharge; its sign indicates whether storage reacts before or after the stream flow.Therefore, it can be used for comparing the capacity of the different models to reproduce to some extent the observed storagedischarge relationships.The normalization of the observed variables related to the storage (here either groundwater level or soil moisture) has no effect on the sign of the HI; the HI values are only being divided by the maximal amplitude ob- served in the storage during the whole period.Therefore, as long as the normalization is applied to the whole period (to all years and to both measurements and simulations), it does not affect the interpretation related to absolute values of the HI.

Models
In previous work, a range of conceptual models was calibrated and evaluated for the Kerrien catchment in a step-wise development using a flexible modelling framework (see Hrachowitz et al., 2014).This section aims at summarizing the results of this previous study as they are used as a basis for the present work.In this previous study, adopting a flexible stepwise modelling strategy, 11 models with increasing complexity, i.e. allowing for more process heterogeneity, were calibrated and evaluated for the study catchment.Four of these 11 models (hereafter referred to as M1 to M4; details given in Tables 1 and 2) were selected for the present work as they correspond to the sequence of model  architectures that provide the most significant performance improvements among the tested set-ups.As a starting point and benchmark, Model M1 with seven parameters, resembling many frequently used catchment models, such as HBV, was used (e.g.Bergström, 1995).The three boxes represent respectively an unsaturated zone, a slow-responding and a fast-responding reservoir.In Model M2, additional deep infiltration losses are integrated from the slow store to take into account the significant groundwater export to adjacent catchments in this study catchment as indicated by the observed long-term water balance (Ruiz et al., 2002).This is done by adding a second outlet together with a threshold to this storage to allow for continued groundwater export from a storage volume below the stream during zero-flow conditions, i.e. when the stream runs dry.As riparian zones frequently exhibit a distinct hydrological functioning (e.g.Molénat et al., 2005;Seibert et al., 2003), indicated in the study catchment by distinct response dynamics in the riparian piezometers (Martin et al., 2006), Models M3 and M4 additionally integrate a wetland/riparian zone component, composed of an unsaturated-zone store and a fast-responding reservoir, parallel to the other boxes.The riparian unsaturated zone generates flow using a linear function in M3 and a nonlinear func-tion in M4.The complete set of water balance and constitutive model equations of the four models is listed in Table 1, while the model structures are schematized in Table 2.

Calibration and evaluation
This section is also a summary of the findings of Hrachowitz et al. (2014) that served as a basis for this study and does not consist of the results of the current study.The models have been calibrated for the period 1 October 2002-30 September 2007 after a 1-year warm-up period, using a multiobjective calibration strategy (e.g.Gupta et al., 1998) based on Monte Carlo sampling (10 7 realizations).The uniform prior parameter distributions used for M1-M4 are provided in Table 3.To reduce parameter and associated predictive uncertainty, the models were calibrated using a total of four calibration objective functions (see Table 4), i.e. the Nash-Sutcliffe efficiencies (Nash and Sutcliffe, 1970) for stream flow (E NS,Q ), for the logarithm of the stream flow (E NS,log(Q) ) and for the flow duration curve (E NS,FDC ) as well as the volumetric efficiency for stream flow (V E,Q ; Criss and Winston, 2008).To facilitate a clearer assessment, the calibration objective functions (n = 4) were combined in a (3) As mathematically optimal parameter sets are frequently hydrologically suboptimal, i.e. unrealistic (e.g.Beven, 2006), all parameter sets within the 4-dimensional space spanned by the calibration Pareto fronts, as approximated by the cloud of sample points, were retained as feasible.
The calibrated models were then evaluated against their respective skills to predict the system response with respect to a selection of 13 catchment signatures (described in Table 4) in a multicriteria posterior evaluation strategy.Figure 3 and Table 4 show the global performance D E of the four models in terms of the Euclidean distance to the perfect model, constructed from all calibration objective functions and evaluation signatures.Model M1 provided good performance in calibration on the objective functions while its validation performances were considerably decreased.Its ability to reproduce the different signatures showed that it failed in particular to reproduce flow in wet periods (such as the evaluation period in Fig. 3a) and groundwater dynamics.Model M2 led to calibration performances slightly lower than model M1 but higher validation performances.The hydrological signatures simulated by M2 exhibited lower uncertainties both in validation and calibration periods because of a better simulation of low-flow conditions and groundwater dynamics.Model M3 provided similar performances to M2 for calibration and for validation but with clearly reduced uncertainty bounds.Overall signature reproduction was improved because of a clear improvement of low-flow and groundwater-related signatures even if performance in calibration objective functions remained lower than that for model M1.Model M4 exhibited similar performances to the previous models both in calibration and validation periods but a better performance for the whole set of signatures and lower uncertainties.
More details on the model calibration and evaluation with respect to hydrological signatures can be found in Hrachowitz et al. (2014; note that M1, M2 , M3 and M4 presented in this study correspond respectively to M1, M6, M8 and M11 in the original paper).Within the obtained range of parameter uncertainty, the types of simulated hysteresis patterns were not affected by the parameter values but only by the model structures.Note that we restricted the following analysis only to the optimal parameter set in each case, first for the sake of clarity and also because, at this stage, our interest was in assessing the ability of model structures to reproduce the observed general features in hysteresis pat-   terns and not in quantifying their performance in fitting the observations.
In the present work the sensitivity of the hysteresis indexes to parameter uncertainty is investigated by computing the HI values for the all sets of feasible parameters.The 2-dimensional observed relationship between saturated storage in the hillslope (HSS) or in the riparian zone (RSS) and stream discharge (Q) for each year was hysteretic, highlighting the nonuniqueness of the response of discharge to storage depending on the initial conditions and a lag time between both variable dynamics, in particular during the recharge period, as illustrated in Fig. 4 for two contrasting water years.
The direction of the hysteretic loop was different depending on the topographic position of the piezometer: loops were always anticlockwise (leading to negative values of the HI) for the piezometer located at the top of the hillslope HSS-F5b(Q), mostly anticlockwise for the midslope piezometer HSS-F4(Q) and mostly clockwise (positive values of the HI) for the piezometer in the riparian zone RSS-F1b(Q) (Fig. 5).
In the riparian zone, storage at Q mid was usually lower in the recession period than in the recharge period, especially in dry years, leading to a positive HI.This is due to the fact that the riparian groundwater level increased early at the beginning of the recharge period, before the stream discharge, due to the limited storage capacity of the narrow unsaturated layer in bottomlands, reinforced by groundwater ridging, which is linked to the extent of the capillary fringe.However, the hysteretic loops were narrow, and, for wet years, the storage value during the recession period occasionally exceeded the value in the recession period without modifying the general direction of the hysteresis when looking at the whole pattern (e.g. in [2003][2004], see Fig. 4a).When this occurred at the time of Q mid , it led to a negative HI although absolute values remained small (Fig. 5) The hillslope groundwater responded later than the stream, due to the deeper groundwater levels and higher unsaturated storage capacity (Rouxel et al., 2011), both introducing a time lag for the recharge and thus for the groundwater response.This led to negative values of the HI as groundwater levels in recession periods were higher than in recharge periods for the same level of discharge (in particular at Q mid ).The loops were also wider in the hillslope, leading to high absolute values of the HI (Fig. 5).
Table 4. Hydrological calibration criteria and evaluation signatures.The performance metrics include the Nash-Sutcliffe efficiency (E NS ), the volume error (E V ) and the relative error (E R ).For all variables and signatures, except for Q, Qlow and GW, the long-term averages were used.1).b Describing the spectral properties of a signal and thus the memory of the system, the observed and modelled autocorrelation functions with lags from 1 to 100 d where compared.c Note that in catchments without long-term storage changes and intercatchment groundwater flow, long-term average RC equals the long-term average 1-E A (Table 1).
The intermediate behaviour of the midslope piezometer (F4), exhibiting varying patterns throughout the years, reflects the fact that the riparian zone extends spatially towards the hillslope and reaches a larger spatial extension during wet years.
Similar observations have been reported by other authors.For example, anticlockwise hysteresis between groundwater tables and discharge are observed by Gabrielli et al. (2012) in the Maimai catchment, while studies on riparian groundwater or river bank groundwater report clockwise hysteresis on the storm event scale (Frei et al., 2010;Jung et al., 2004).Similar patterns were also observed by Jung et al. (2004), who found that in the inner floodplain and in river bank piezometers, the hysteresis curve between the water table and river stage exhibits a synchronous response, while in the hillslope hysteresis, curves are relatively open as the water table is higher during the recession than during the rising limb.

Observations regarding hillslope: saturated and unsaturated storages vs. flow
Figure 6 shows the 3-dimensional relationship between hillslope saturated storage (HSS), unsaturated storage (HUS) and stream flow (Q) for the year 2010-2011.Four main periods can be identified, similar to what was outlined in recent studies (e.g.Heidbuechel et al., 2012;Hrachowitz et al., 2013a): three characterized the recharge period and the last one the recession period.First, stream flow was close or equal to 0 and was almost exclusively sustained by drainage of the saturated storage, while the unsaturated zone exhibited a significant storage deficit and only minor fluctuations due to transpiration and small summer rain events (dry period).As steadier precipitation patterns set in, here typically around November, the unsaturated-zone storage reached its maximal value relatively quickly, rapidly establishing connectivity with fast-responding flow pathways (wetting period).This led to a relatively rapid increase in stream flow while the saturated storage did not change much until the end of this  period as incoming precipitation first had to fill the storage deficit in the unsaturated zone before a significant increase in percolation could occur.A further lag was introduced by the time taken for water to percolate and eventually recharge the relatively deep groundwater.As soon as conditions were wet enough to allow for established percolation, the saturated storage eventually also responded, increasing faster than the stream flow (wet period), while unsaturated storage remained full.During the wet period (or high-flow period), no pattern appeared clearly because all storage elements were almost full and the responses of all the compartments were more directly linked to the short-term dynamics of rain events.Finally during the recession period (drying period), unsaturated storage decreased comparatively quickly by drainage and transpiration, while the saturated storage kept increasing for a while by continued percolation from the unsaturated zone before decreasing through groundwater drainage at a relatively slow rate.A similar pattern was also observed for 2011-2012 (not shown).
The unsaturated-zone storage followed a clockwise hysteresis loop with the stream flow and with the saturated-zone storage.The hysteresis indexes (Fig. 5, years 2010-2011 and 2011-2012) reflected these directions and showed that the hysteresis loops were narrower for unsaturated storage than for saturated storage, inducing smaller absolute values of the hysteresis indexes due to the small size of the unsaturated storage compartment compared to the saturated storage compartment.

Interpretation
There are three main hypotheses generally proposed to interpret storage-discharge hystereses in hydrology.The first one is related to the increase in transmissivity with the groundwater level due to the frequently observed exponential decrease in hydraulic conductivity with depth.However, this would lead to systematic clockwise hysteresis loops and can- not explain the anticlockwise patterns observed between hillslope saturated storage and stream flow.The second hypothesis proposed by Spence et al., 2010 is that during the recharge period, the groundwater storage not only increases locally (as measured by the piezometric variations), but the spatial extension of connected storage also increases gradually, while during the recession period, the storage decreases homogenously across the entire contribution area.This is likely for riparian groundwater and could explain the clockwise hysteresis observed on this piezometer but cannot explain the anticlockwise hysteresis observed in the hillslope groundwater.The third hypothesis is that dominant hydrological processes are different between recharge and recession periods.For instance, Jung et al. (2004) interpret their clockwise hysteresis in peatlands groundwater as the results of a stepwise filling process during the rising flows (fill and spill mechanism) opposed to a more gradual drainage of the groundwater during the recession combined with the first hypothesis result, similar to what was found by Hrachowitz et al. (2013a).This hypothesis of different hydrological pathways allows an adequate interpretation of the opposite directions of the observed hystereses.The recharge period is characterized by a quick filling of the unsaturated and saturated storages in the riparian zone, which is always close to saturation, while the saturated storage on the hillslope is not yet filling up (wetting period).Thus, the wetting period is characterized by an increase in stream flow, here mainly generated in the riparian zone, and eventual quick flows in the hillslope, while the hill-slope unsaturated zone reaches the storage capacity volume.At the beginning of the wet period, hillslope saturated storage fills and starts to contribute to the stream, along with riparian and fast flows.During the recession period (drying period), the hillslope saturated zone is the only compartment which continues to sustain stream flow.If this hypothesis is correct, there are three contributions to stream flow in the wet period, while, during the recession period, hillslope groundwater remains the only contributor to stream flow (cf.Hrachowitz et al., 2013a, see Fig. 7).This can explain the difference between storage values in recharge and recession periods.Finally, the hysteretic hydrological signature is not only related to the amount of stored water in the catchment but rather to where it is stored.
These results are consistent with previous studies: the distinction between riparian groundwater and hillslope groundwater components has also been identified in similar catchments (by Molénat et al. (2008) based on nitrate concentration analysis and by Aubert et al. (2013a) based on a range of solutes) and at other site (by Haught and van Meerveld (2011)) using such Q-S relationships and lag time analysis.

Sensitivity of the HI to initial conditions
Sensitivity to antecedent soil moisture conditions is often cited as an explanation for observed storage-discharge hysteresis and its variability between years.The initial levels of each store will obviously influence the time required to fill  them and consequently the duration of the successive periods identified in the whole recharge period.As only 2 years of data were available, it was not possible to define a relationship between the initial average soil moisture and the magnitude of the hysteresis indexes.However, the magnitude of the HI was lower for high initial values of average unsaturatedzone storage for both the saturated and unsaturated zones in 2011-2012 (Table 5).The HI for the midslope saturated zone (F4b) seemed to be more sensitive to these initial moisture conditions than the HI for the upslope saturated zone and unsaturated zone.Similarly, the width of the loop (absolute value of the HI) was not very sensitive to initial groundwater levels in the hillslope: although the larger absolute values of the HI were observed for the lower initial water table levels, no clear correlation was observed (Fig. 8).

Sensitivity of the HI to annual rainfall
For the saturated zone, the observed values of the HI were negatively correlated with the total annual rainfall for both the hillslope and the riparian zone, with a more negative slope for the hillslope (Fig. 9).Wet years (i.e.large values of annual rainfall) are generally associated with large values of annual maximal and midpoint stream flows and also with large values of groundwater table level, leading to larger saturated-storage values during the recession period, while the storage values during the recharge period do not change much from year to year.Thus, larger storage values at the time of midpoint discharge in the recession period led to smaller values of the HI (i.e larger absolute values for the hillslope, where hystereses are anticlockwise, and smaller absolute value of the HI for the riparian zone, where hystereses are clockwise).In the riparian zone, when rainfall and maximal drainage reached a very high value, it could lead to a saturated-storage value at the time of midpoint discharge in the recession period that was larger than the corresponding value during the recharge period, explaining the inversion of the sign of the HI for RSS(Q) in very wet years.

Hysteresis simulations
For all years, all models (M1-M4) exhibited a hysteretic relationship between stream flow and storage, as shown in Fig. 10 for the years 2003-2004 and 2007-2008, pertaining to the calibration and validation periods respectively.This means that all tested models introduced a lag time between catchment stores and the stream dynamics.Fig. 11a presents the observed and modelled average and standard deviation of the annual hysteresis indexes, for hillslope saturated storage vs. discharge HSS(Q), hillslope unsaturated storage vs. discharge HUS(Q), hillslope unsaturated storage vs. hillslope saturated storage HUS(HSS) and riparian saturated storage vs. discharge RSS(Q).As riparian saturated storage (RSS) is not modelled in M1 and M2, simulated RSS(Q) was available only for M3 and M4.For M1, the shape of the simulated hysteresis showed an overestimation of hillslope saturated storage (HSS) and of flow during dry years (e.g. the year 2007-2008 shown in Fig. 10).This was expected as we have seen that the model was unable to reproduce groundwater dynamics and the low signatures during the validation period (Fig. 3 and supplementary material).Simulated HI values were close to the observed ones for HSS(Q) (Fig. 11a).The simulated hysteresis indexes were small and negative for HUS(Q), while the observed values were large and positive.Simulated HI values for HUS(HSS) were also overestimated.These results show that, in model M1, the overestimation of the hillslope saturated storage was partially compensated by the underestimation of the hillslope unsaturated storage.This reveals the poor consistency of the model and explains why it was able to reach good performance in the calibration period but not in the validation period (Fig. 3).
For the model M2, the shape of the hysteresis loops showed a considerable underestimation of HSS and a large underestimation of stream flow in wet years (Fig. 10).Compared to M1, although the introduction of deep losses in M2 led to higher validation performances and better simulation of hydrological signatures (Fig. 3), the simulated HIs (Fig. 11a) worsened, suggesting a poorer model consistency with respect to internal hydrologic processes.For both models M3 and M4, the introduction of a riparian compartment improved the simulated hysteretic loops, due to a better simulation of stream flow in wet years, but HSS was still largely underestimated (Fig. 10).The mean HI values for HSS(Q) were close to the observed one, but the range of variation was smaller, indicating a reduced sensitivity to climate (Fig. 9).The mean values for HUS(Q) were clearly improved compared to M1 and M2 as the direction of the loop was clockwise as for the observations, although the values were still underestimated.The mean HI values for HUS(HSS) were also greatly improved.The shape of the simulated hysteresis loops between riparian saturated storage (RSS) and stream flow (Q) showed a large underestimation of RSS, especially during the recession period (Fig. 10c, d).This led to simulated HIs for RSS(Q) which are positive, like the observed ones, but also largely overestimated (Fig. 11a).Overall, these results suggest that for models including a riparian component, the underestimation of the hysteresis between HUS and Q was compensated for by an overestimation of the hysteresis between RSS and Q.This highlights that, despite a significant improvement in performances and improved hydrological signature reproduction, these models still involve a certain degree of inconsistency with respect to internal processes.However, M4 provided the most balanced performance considering hysteretic signatures between all storage components and strongly underlines the limitations of overly simplistic model architectures (e.g.M1) and the need for more complete representations of process heterogeneity.The hysteresis index sensitivity to parameter uncertainty increases with the number of parameters from M1 to M2 and then stays in the same range from M2 to M4 (Fig. 11b).This analysis confirms the importance of considering the hysteresis indexes both between saturated and unsaturated storage (HSS and HUS) to avoid accepting a wrong model.For example, considering only the performance regarding the HSS(Q) relationship could lead one to accept model M1, while its performance on HUS is lower and it is not able to reproduce the Riparian compartment hysteresis.For readability purposes, Fig. 11b only illustrates this sensitivity for the different HIs in the year of 2011-2012 but similar behaviour is observed every year.It shows that best behavioural parameter sets (bbp) lead to modelled HI values closer to the observed values than average modelled HI values.Using an additional calibration criterion related to the hysteresis could reduce the sensitivity of the HI to parameter uncertainty and lead to a narrow range of feasible parameter sets.

Sensitivity of modelled hysteresis indexes to annual rainfall
All models were able to represent the decrease in the hysteresis indexes with annual rainfall on the hillslope, the slope of the correlation getting closer to the observed one from M1 to M4 (Fig. 9).The introduction of deep-groundwater losses (M2) led to smaller saturated storage during recharge periods and increased the difference between saturated storage during recharge and recession periods at the time of midpoint discharge.However, as all models tended to overestimate low stream flow values, the slopes of the correlations between annual rainfall and the simulated HI were smaller than for the observed one.
In the riparian zone, the modelled trends were the inverse of the observed one.The modelled recessions were always very sharp (see Hrachowitz et al., 2014), and the simulated riparian storage dried up every year, explaining why saturated storage at the time of midpoint discharge during the recession periods was much greater than during the recharge periods.This led to a general overestimation of HI values, which were even stronger for wet years.This overestimation may be re-lated to an improper conceptualization of the riparian-zone functioning, which is never connected to the hillslope reservoir in the tested models.In reality, during high-flow periods, the observed hydraulic gradient increased along the hillslope, inducing a connection between riparian and hillslope reservoirs which are disconnected during low-flow periods.

Value of such internal signatures for model evaluation
The use of hydrological hysteretic signatures in model assessments led to conclusions that were consistent with the classical hydrological signatures used in Hrachowitz et al. (2014).However, model M2 was less able to reproduce the different hysteretic signatures, whereas it led to a real improvement regarding to the classical signatures in low flows.
Considering only the distance between observed and simulated hysteresis indexes on hillslope saturated storage and stream flow would lead one to select model M1.This highlights the fact that using saturated-storage dynamics alone can be deceptive for understanding the system response behaviour and that it is thus crucial to also consider the hysteretic signatures of unsaturated and riparian zones in a combined approach to develop a more robust understanding of the system.Here, hysteretic signatures of the unsaturated and riparian zones provided valuable additional assessment metrics regarding the performance of models M3 and M4 to represent the riparian zone.It was possible to identify when the model failed to represent processes, which processes were mostly compensating for missing ones and therefore why the model may provide some good performance for the wrong reasons.In this regard, the hysteresis index proved to be a useful proxy of hystereses themselves as it exhibited contrasted patterns sensitive to climate and localization within the catchment.

Perspectives: toward an integrated hydrological-signature-based modelling?
A general issue in model calibration is that, because of the overparameterization of hydrological models and because the objective functions generally only integrate one variable, such as the stream flow, automatic calibration techniques may lead to parameter sets which compensate for internal model errors.These parameter sets are mathematically correct but wrong from a hydrological point of view.The subsequent model should then be considered nonbehavioural (Beven, 2006).For instance, if storage properties are not taken into account well by the model, this is likely to lead to a wrong simulation of storage dynamics in response to precipitation.Thus, the parameterization using traditional objective functions can lead to compensation of these errors in order to simulate a discharge value close to the observed one while the storage is wrong.In such a case, a model able to represent the internal catchment behaviour will generate a wrong discharge value which is, however, consistent with the storage value and will be rejected in traditional calibration procedures.To handle this issue and in order to select behavioural models, one can use multiple objective functions (Gupta et al., 1998;Seibert and McDonnell, 2002;Freer et al., 2003), including a range of hydrological signatures to be reproduced or additional realism constraints (Kavetski and Fenicia, 2011;Yadav et al., 2007;Yilmaz et al., 2008;Euser et al., 2013;Gharari et al., 2013;Hrachowitz et al., 2014).We argue that, rather than increasing the number of constraints or objective functions which have to be satisfied, an alternative could be to use some objective functions based on a combination of different variables, such as stream flow and the groundwater level, soil moisture or stream concentrations.Among the possible combination of variables, objective functions based on the relative dynamics of storage in different spatial locations, such as riparian versus hillslope, might provide new insights into the catchment-internal processes.We suggest that such combined objective functions would be more constraining for model selection.Therefore, the present study is a first step which aims at highlighting the still underexploited potential of hydrological hysteresis.The next step would be to quantify these relationships through functions or several indexes usable in calibration criteria, such as the hysteresis index proposed in this study.Moreover, such criteria could be used in classification studies.Indeed, some studies in the literature present storage-discharge relationships for different catchments that show patterns that are similar or dissimilar to the ones we observed in the Kerrien catchment (Ali et al., 2011;Gabrielli et al., 2012).This signature may help to classify catchments in terms of dominant processes driving their behaviour.
A remaining difficulty with integrating storage into calibration or evaluation procedure in hydrological modelling is how to measure this storage.McNamara et al. (2011) and Tetzlaff et al. (2011) proposed using all available data from groundwater level monitoring, soil moisture records, water budget, modelling results and so on to estimate the storage in catchments.In this study, we used quite a dense network of piezometers and soil moisture measurements relative to the small size of the catchment.Promising ways to estimate spatial quantification of storage in catchments include remote sensing of soil moisture (Sreelash et al., 2013;Vereecken et al., 2008), gravimetric techniques (Creutzfeldt et al., 2012), geodesy and geophysical methods.The interest in such techniques would be to provide a spatially integrated vision of the catchment water content.
As for the different hydrological variables, the combination of hydrological and chemical variables appears relevant to investigating the hydrochemical behaviour of catchments.Hysteresis patterns between concentration and discharge have been largely documented for storm event characterization (Evans and Davies, 1998;Evans et al., 1999;Taghavi et al., 2011).Some studies also report similar patterns on the annual scale (e.g.Aubert et al., 2013b).Such hysteretic relationships have been observed also between water and chemistry in groundwater (Rouxel et al., 2011;Hrachowitz et al., 2013a), emphasizing a disconnection between water and solute dynamics that simple diffusion or partial mixing processes cannot explain.Stream water chemistry also exhibits particular seasonal cycles with different phasing and with discharge depending on the solutes (Aubert et al., 2013b).This provides extra information on the water pathways within the catchment.These relationships also appear to be powerful in constraining hydrochemical modelling.

Conclusions
A method to characterize and partially quantify the relationship between storages in a headwater catchment and stream flow throughout a year has been proposed.It allowed us to then assess the ability of a range of conceptual lumped models to reproduce this catchment-internal signature.Catchment storage has been approximated using a network of piezometric data and several unsaturated-zone moisture profiles to consider the storage in the saturated as well as in the unsaturated zones.
The observations showed that storage-discharge relationships in catchments can be hysteretic, highlighting a successive activation of different hydrological components during the recharge period, while the recession exhibits a fast decrease in unsaturated and riparian storage and a slow decrease in hillslope saturated storage which sustains the stream flow.Four periods have been identified in the hydrological year: (1) first, at the end of the dry period, rainfall starts to refill unsaturated storage; (2) in the wetting period, riparian unsaturated storage is filled and the saturated storage starts to supply the stream while hillslope unsaturated storage is still being replenished; (3) during the wet period, unsaturated storage in the hillslope is also filled and the saturated hillslope storage also feeds the stream.(4) Finally when rainfall declines, flow from the riparian groundwater recedes and, during the recession period, the stream discharge is sustained only by hillslope groundwater.Stream discharge and riparian and hillslope saturated storages exhibited different patterns of hysteresis, with opposite directions of the hysteretic loops.
The tested models were characterized by an increasing degree of complexity and also an increasing consistency, as shown in a previous study using classical hydrologic signatures.In this study, we showed that, if all of the models simulated a hysteretic relationship between storage and discharge, their ability to reproduce the hysteresis index also increased with model complexity.In addition, we suggest that, if classical hydrological signatures help to assess model consistency, the hysteretic signatures also help to identify quickly when and why the models give "right answers for the wrong reasons" and can be used as a descriptor of the internal catchment functioning.

Figure 1 .
Figure 1.Study site in west Brittany (indicated by the square near Quimper) and location of the monitoring equipments.The weather station is located 500 m north of the catchment.

Figure 2 .
Figure 2. Normalized (a) groundwater levels for piezometers in the hillslope (F4 and F5b) and in the riparian zone (F1b) and (b) average, maximum and minimum unsaturated-zone storages for all the sensors in the two profiles in the Kerrien catchment.

Figure 3 .
Figure 3. (a) Observed (red line) and modelled runoff for model set-ups (A) M1, (B) M2, (C) M3 and (D) M4 in calibration and independent evaluation (validation) periods.Modelled runoff shown as the most balanced solution (dark blue line) and the 5/95th uncertainty bounds (light blue shaded area).Adapted from Hrachowitz et al. (2014).(b) Overall model performance for all model set-ups (M1-M4) expressed as Euclidean distance from the "perfect model" computed from all calibration objectives and signatures with respect to calibration and validation periods.Triangles represent the optimal solution, i.e. the solution obtained from the parameter set with the lowest Euclidean distance during calibration.Box plots represent the Euclidean distance for the complete sets of all feasible solutions (the dots indicate 5/95th percentiles, the whiskers 10/90th percentiles and the horizontal central line the median).From Hrachowitz et al. (2014).

Figure 4 .
Figure 4. Examples of annual hysteretic loops for saturated-zone storage vs. stream flow which are clockwise in the riparian zone (a, b) and anticlockwise in the hillslope (c, d) for the wet year 2003-2004 (a, c) and the dry year 2007-2008 (b, d).

Figure 5 .
Figure5.Annual hysteresis indexes (HI) computed for the piezometers in the Kerrien catchment from 2002 to 2012.F5b is located upslope, F4 midslope and F1b downslope in the riparian area.RRS(Q) is the hysteresis between stream flow and riparian saturated-zone storage (measured at F1b).HSS-F5b(Q), HSS-F4(Q) and HSS(Q) are hystereses between stream flow and upslope (at F5b), midslope (at F4) and hillslope (average of F5b and F4) saturated storages respectively.HUS(Q) is the hysteresis between stream flow and hillslope unsaturated storage (HUS) (computed from the average of normalized volumic moisture sensors in profiles sB1 and sB2), and HUS(HSS) is that between the hillslope unsaturated-and saturated-zone storage (average of F5b and F4).

Figure 6 .
Figure 6.Evolution of stream flow (Q in mm d −1 ) and normalized hillslope unsaturated storage (HUS) and hillslope saturated storage (HSS) for the water year 2010-2011 (October to September).The size of the dots increases with time.Unsaturated storage (HUS) is computed from the moisture sensors in profiles sB1 and sB2; saturated storage (HSS) is represented using a normalized groundwater table level (computed from two piezometers in the hillslope).(a) is the 3-dimensional plot and (b, c, d) are the respective 2-dimensional projections of (a) on the three plans.

Figure 7 .
Figure 7. Conceptual scheme of successive mechanisms explaining the annual hysteresis between storages and stream flows.HUS: hillslope unsaturated storage; HSS: hillslope saturated storage; RUS: riparian unsaturated storage; RSS: riparian saturated storage; Q: stream flow.Bold characters indicate compartments with varying storage; grey arrows indicate whether the compartment is filling or emptying; black arrows indicate the water flow paths.

Figure 8 .
Figure 8. Year-to-year variations, for the 10 monitoring years, of the hysteresis indexes: (a) HSS-F5b(Q) and HSS-F4(Q) (HI) versus the initial groundwater table level depth in the corresponding hillslope piezometer (F5b or F4) and (b) HSS-F1b(Q) versus the initial groundwater table level depth in the piezometer in the riparian area (F1b).

Figure 9 .
Figure 9. Variations of observed (data) and simulated (M1 to M4) hysteresis index versus annual rainfall for the 10 monitored water years for (a) hillslope saturated storage versus discharge HSS(Q) and (b) riparian saturated storage vs. discharge RSS(Q).Solid lines indicate the linear regressions.

Figure 10 .
Figure 10.Observed and simulated annual hysteresis between stream flow (Q) and (a, b) saturated storage in the hillslope HSS (for observed hysteresis, HSS is the average of F5b and F4) and (c, d) saturated storage in the riparian area RSS (for simulated hysteresis, only M3 and M4 represent the riparian area), for the water years (a, c) 2003-2004 (wet year, calibration period) and (b, d) 2007-2008 (dry year, validation period).

Figure 11 .
Figure 11.(a) Mean annual hysteresis indexes observed and simulated with the four models M1 to M4 for hillslope saturated storage vs. discharge HSS(Q), hillslope unsaturated storage vs. discharge HUS(Q), hillslope unsaturated storage vs. hillslope saturated storage HUS(HSS) and riparian saturated storage vs. discharge RSS(Q).RSS is only simulated in models M3 and M4.Error bars show the standard deviation for the 10 years for HSS(Q) and RSS(Q) and the values for the 2 available years for HUS(Q) and HUS(HSS).(b) Sensitivity of hysteresis index values to parameter uncertainty for the year 2011-2012.Mx bbp indicates the value for best behavioural parameter sets; the circles, triangles, squares and diamonds indicate the mean HI value for the all the behavioural parameter sets, and the corresponding bars indicate its range of variation.

Table 1 .
Water balance and state and flux equations of the models used.

Table 2 .
Model structures and parameters.

Table 2 . Model structures and parameters.
F , k S, P max , L P , S Umax,H , β, C P

Table 2 . Model structures and parameters.
F , k S, P max , L P , S Umax,H , β, C P

Table 3 .
Prior and posterior distribution of the model parameters.