Hydrological hysteresis in catchments and its value for assessing process consistency in conceptual models

storage both in the saturated and unsaturated zones of the hillslope and the riparian zone of a headwater catchment in French Brittany (ORE AgrHys). The saturated zone storage was estimated using distributed shallow groundwater levels and the unsaturated zone storage using several moisture profiles. All hysteretic loops were characterized by a hysteresis index. Four conceptual models, previously calibrated 15


Introduction
Rainfall-runoff models are tools that mimic the low-pass filter properties of catchments.Specifically they aim at reproducing observed stream flow time series by routing time series of meteorological drivers through a sequence of mathematically formalized processes that allow a temporal dispersion of the input signals in a way that is consistent with the modeller's conception of how the system functions.The core of most models, in particular in temperate, humid climates, dominated by some type of subsurface flow, is a series of storage-discharge functions that, in most general terms, express system output (i.e.discharge and evaporation) as function of the system state (i.e.storage), thereby generating a signal that is attenuated and lagged with respect to the input signal (i.e.precipitation).
However, modelling efforts on the catchment scale typically face the problem that on that scale neither integrated internal fluxes nor the integrated storage and the partitioning between different storage components at a given time, can be easily observed within limited uncertainty.Indeed, indicators of catchment storage such as groundwater levels and soil water content can be highly variable in space, and exhibiting heterogeneous spatio-temporal dynamics.While spatial aggregation of storage estimates (e.g.catchment averages) in lumped models may lead to a loss of crucial information and thus to overly-simplistic representations of reality, allowing for explicit incorporation of spatial storage heterogeneity in (semi-)distributed models may prove elusive in the presence of data error and the frequent absence of detailed spatial knowledge of the properties of the flow domain.These problems were recently addressed in some studies that intended to assess catchment storage using all available data (McNamara et al., 2011;Tetzlaff et al., 2011) and showing the importance of this storage in thresholds observed in the response of discharge to precipitation in catchments.For example, Spence (2010) argued that the observed non-linear relationships between stream flow and catchment storage (i.e.no unique storage-discharge relations) are the manifestation of thresholds occurring in catchment runoff generation.Thus, depending on the structure of the system, storage-discharge dynamics can exhibit hysteretic patterns, i.e. the system response depends on the history and the memory of the system (e.g.Everett and Whitton, 1952;Ali et al., 2011;Gabrielli et al., 2012;Haught and van Meerveld, 2011).Andermann et al. (2012) found a hysteretic relationship between precipitation and discharge in both glaciated and unglaciated catchments in the Himalaya Mountains that was shown to be due to groundwater storage rather than to snow or glacier melt.Hrachowitz et al. (2013a), demonstrating the presence of hysteresis in the distribution of water ages, highlighted the importance of an adequate characterization of all system-relevant internal states at a given time, to predict the system response within limited uncertainty as flow can be generated from different system components depending on the wetness state of the system.
In catchment-scale rainfall-runoff models, the need for calibration remains inevitable (Beven, 2001) due to the presence of data errors (e.g.Beven, 2013), and to the typically oversimplified process representations (e.g.Gupta et al., 2012).In spite of their comparatively high degrees of freedom, such models are frequently evaluated only against one single observed output variable, e.g.stream flow.Although the calibrated models may then adequately reproduce the output variable, model equifinality (e.g.Savenije, 2001) will lead to many apparently feasible solutions that do not sufficiently well reproduce system internal dynamics as they are mere artefacts of the mathematical optimization process rather than suitable representations of reality (Gharari et al., 2013;Hrachowitz et al., 2013b;Andréassian et al., 2012;Beven, 2006;Kirchner, 2006).The understanding for the need for multi-variable and -objective model evaluation strategies to identify and discard solutions that do not satisfy all evaluation criteria applied is therefore gaining ground (e.g.Freer et al., 1996;Gupta et al., 1998Gupta et al., , 2008;;Gascuel-Odoux et al., 2010), as this will eventually lead to models that are not only capable of reproducing the observed output variables (e.g.stream flow) but that also represent the system internal dynamics in a more realistic way (Euser et al., 2013).The value of such multi-variable and/or -objective evaluation strategies has been demonstrated in the past, for example using groundwater levels (e.g.Fenicia et al., 2008;Molénat et al., 2005;Giustolisi and Simeone, 2006;Freer et al., 2004;Seibert, 2000;Lamb et al., 1998), soil moisture (Kampf and Burges, 2007;Parajka et al., 2006), saturated areas extension (Franks et al., 1998), snow cover patterns (e.g.Nester et al., 2012), remotely sensed evaporation, (e.g.Mohamed et al., 2006;Winsemius et al., 2008), and stream flow at sub-catchment outlets (e.g.Moussa et al., 2007).However, most studies using multiple response variables only evaluate them individually to identify Pareto optimal solutions.This practice may result in the loss of critical information such as the timing between the multiple variables.In other words it is conceivable that model calibration leads to Pareto-optimal solutions with adequate model performance for all variables, while at the same time misrepresenting the dynamics between these variables.Rather, using a synthetic catchment property (Sivapalan et al., 2005) or a hydrological signature (Wagener and Montanari, 2011;Yadav et al., 2007), combining different variables into one function, may potentially serve as a instructive diagnostic tool or as a calibration objective or even as a metric for catchment classification (Wagener, 2007).
The objective of this paper is to explore (i) the potential of using annual hysteric patterns observed between stream flow and water storage both in the saturated and unsaturated zones of the hillslope and of the riparian zone for characterizing the hydrological functioning of a small headwater catchment in French Brittany (ORE AgrHys) and (ii) to which degree a suite of conceptual rainfall-runoff models with increasing complexity can reproduce the observed storage-discharge hysteresis and if the use of the storage-discharge hysteresis can provide additional information for model diagnostics compared to traditional model evaluation metrics.

Study sites
Kerrien (10.5 ha) is a headwater catchment located in south-western French Brittany (47 • 35 N; 117 • 52 E, see Fig. 1).Elevations range from 14 to 38 m a.s.l., slopes are less than 8.5 %.The climate is oceanic, with mean annual temperature of 11.9 • C, minimum of 5.9 • C in winter and maximum of 17.9 • C in summer.Mean annual rainfall over the period 1992-2012 is 1113 mm (±20 %) and mean annual Penman potential evapotranspiration (PET) is 700 mm (±4 %).Mean annual drainage is 360 mm (±60 %) at the outlet.The catchment is laying under granite called leucogranodiorite of Plomelin, which upper part is weathered on 1 to more than 20 m deep.Soils are mainly sandy loam with an upper horizon rich in organic matter, depths are comprised between 40 and 90 cm.Soils are well drained except in the bottomlands which represent 7 % of the total area.Agriculture dominates the land use with 86 % of the total area.The base flow index is about 80-90 %, thus the hillslope aquifer is the main contributor to stream (Molénat et al., 2008;Ruiz et al., 2002).

Data
Meteorological data were recorded in an automatic weather station (CIMEL, Fig. 1) which provides hourly rainfall and variables required to estimate daily Penman PET (net solar radiation, air and soil temperatures, wind speed and direction).Discharge was calculated from water level measurements at the outlet (Fig. 1) using a V-notch weir equipped with a shaft encoder with integrated Data Logger (OTT Thalimedes) recorded every 10 min since 2000 (E3).Groundwater levels were monitored every 15 min since 2001 in 3 piezometers F1b, F4, and F5b (Fig. 1) using vented pressure probe sensors (OTT Orpheus Mini).Moisture in the unsaturated zone was recorded every 30 min since July 2010, at 7 depth (25,55,85,125,165,215, and 265 cm), on 2 profiles sB1 and sB2 (Fig. 1),  (2010-2011 and 2011-2012).

Catchment storage estimates
In order to obtain a proxy for the saturated zone storage at the catchment scale, the time series of groundwater level were normalized between their minimal and maximal values over the 10 years of records so that the normalized value is comprised between 0 and 1.The resulting normalized variable exhibited very similar dynamics among all the piezometers (see Fig. 2a).However, the piezometer located in the riparian zone (F1b) exhibited variations at a higher frequency, especially during the winter.Therefore, in the following, we used the average of the normalized level in the two hillslope piezometers (F5b, F4) as a proxy for the hillslope groundwater storage, and the normalized level in the riparian piezometer as a proxy for the riparian groundwater storage.
In order to obtain a proxy for the unsaturated zone storage, moisture time series were also normalized using the minimal and maximal values observed in all the sensors of the two profiles over the two water years with complete records, setting the minimal value as 0 and the maximal value as 1.As the obtained normalized unsaturated storage variables were following very similar trends and dynamics, we used in the following an average of the normalized unsaturated zone storage among all the measurement points (depths and profiles) (Fig. 2b).The two profiles being located on the upslope and downslope parts of the hillslope, we considered that this average value is a proxy for the unsaturated zone storage dynamics on the studied hillslope.

Hysteresis indices
Studies on hysteretic relationships in catchments generally focus on qualitative descriptions of patterns associated with a cross correlation analysis between the two variables (Frei et al., 2010;Hopmans and Bren, 2007;Jung et al., 2004;Salant et al., 2008;Schwientek et al., 2013;Spence et al., 2010;Velleux et al., 2008).Some authors proposed a typology of hysteretic loops based of their rotational direction, curvature and trend to identify solute controls during storm events (Butturini et al., 2008;Evans and Davies, 1998).For storage-discharge hysteresis at the annual scale, this approach is not sufficient as the same type of hysteretic loop is likely to happen for most of the years: here a preliminary analysis revealed that storage and stream flow are strongly correlated, and cross correlation value is the greatest for lag time of 0 day (results not shown).
Quantitative descriptions of the hysteretic loop are also found in the literature, and various ways of computing hysteresis indices (HI) have been proposed, for example using the relative difference between extreme concentration values (Butturini et al., 2008) or using the ratio of turbidity values in rising and falling limbs of the storm hydrograph at the mid-point discharge value (Lawler et al., 2006).The latter authors argue that computing HI by using mid-point discharge usually allows avoiding the small convolutions which are frequently observed at both ends of the hysteretic loop.
In this paper, we calculated the hysteresis index (HI) each year as the difference between water storages at the dates of mid-point discharge in the two phases of the hydrological year, during the recharge period (R) and the recession period (r), i.e. respectively before and after reaching the maximal discharge Q max , as follow: where S(t) is the storage value at time t and Q(t) the stream flow value at time t.The mid-point discharge Q mid , is defined as the mean value of discharge between Q 0 the initial value at the beginning of the hydrological year (October), and Q max the maximal value reached during that year: (2) Computing HI using the difference in storage was possible here because storage and stream flow values are varying among years within a narrow range of magnitude, while Lawler et al. (2006) used the ratio because turbidity can differ of several orders of magnitude from one storm to the other.Computing HI with the difference between the values of storage and not their ratio allowed maintaining its sensitivity to the year to year variations of the width of the hysteretic loop.Water storage dynamics in the unsaturated and saturated zones were approximated by the difference in normalized soil moisture content and by the difference in normalized groundwater level, respectively.HI gives 2 types of information (i) its sign indicates the direction of the loop (anticlockwise loop induces a negative value of HI whereas a clockwise loop leads to a positive value of HI) and (ii) its absolute value is proportional to the magnitude of the hysteresis (i.e. the width of the hysteretic loop).HI is a proxy for the importance of lag time response between variations in catchment storages (unsaturated and saturated) and stream discharge.Therefore, it can be used for comparing the capacity of the different models to reproduce to some extent the observed storages/discharge relationships.

Models
In a previous work, a range of 11 conceptual models were calibrated and evaluated for the Kerrien catchment in a stepwise development using a flexible modelling framework (see Hrachowitz et al., 2014).In this study, adopting a flexible stepwise modelling strategy, four models (M1-M4) with increasing complexity, i.e. allowing for more 5671 process heterogeneity, were calibrated and evaluated for the study catchment.Model M1 resembles frequently used 3-box HBV type models (e.g.Bergström, 1995) with 7 parameters.The three boxes represent respectively an unsaturated zone, a slow responding and a fast responding reservoir.In Model M2, deep infiltration losses are integrated from the slow store to take into account the significant groundwater export to adjacent catchments in this study catchment as indicated by the observed longterm water balance (Ruiz et al., 2002).It is done by adding a threshold into this storage to allow for continued groundwater export from a storage volume below the stream during 0-flow conditions, i.e. when the stream runs dry.As riparian zones frequently exhibit a distinct hydrological functioning, (e.g.Molénat et al., 2005;Seibert et al., 2003), indicated in the study catchment by distinct response dynamics in the riparian piezometers (Martin et al., 2006), models M3 and M4 additionally integrate a wetland/riparian zone component, composed of an unsaturated zone store and a fast responding reservoir, parallel to the other boxes.The riparian unsaturated zone generates flow using a linear function in M3 and a non-linear function in M4.The complete set of water balance and constitutive model equations of the four models is listed in Table 1 while the model structures are schematized in Table 2.

Calibration
The models have been calibrated for the period 1 October 2002-30 September 2007 after a one-year warm-up period, using a multi-objective calibration strategy (e.g.Gupta et al., 1998), based on Monte Carlo sampling (10 7 realizations).The uniform prior parameter distributions used for M1-M4 are provided in Table 3.Four calibration objective functions were used including the Nash-Sutcliffe efficiencies (Nash and Sutcliffe, 1970) where X is either Q, log(Q) or the FDC.The last calibration objective function is the volumetric efficiency for stream flow (Criss and Winston, 2008): As mathematically optimal parameter sets are frequently hydrologically sub-optimal, i.e. unrealistic (e.g.Beven, 2006), all parameter sets within the 4-dimensional space spanned by the calibration Pareto fronts, as approximated by the cloud of sample points, were retained as feasible.Model uncertainty intervals were constructed by weighting the feasible parameter sets with a likelihood measure L = D p E , (e.g.Freer et al., 1996) with an exponent of p = 10 to emphasize models with good calibration performance.D E is the Euclidean distance to the perfect model, i.e.E NS,i = E V ,i = 1, representing a combined performance metric of all 4 calibration objective functions (e.g.Hrachowitz et al., 2013a;Gascuel-Odoux et al., 2010): The tested models were in a first step evaluated against their respective skills to predict the system response with respect to the calibration objectives during a validation period (1 October 2007-30 September 2012) then using a selection of 13 catchment signatures (low/high flows dynamics, groundwater dynamics, flow duration curves, autocorrelation) in a multi-criteria posterior evaluation strategy.Figure 3 shows the global performance of the 4 models obtained on both calibration and validation using 5673 the 4 objective functions and the 13 hydrological signatures, in terms of the Euclidean Distance constructed from all objective functions and signatures to the perfect model.Model M1 provided good performance in calibration on the objective functions while its validation performances were considerably decreased.Its ability to reproduce the different signatures showed that it failed in particular to reproduce flow in wet periods and groundwater dynamics.Model M2 led to calibration performances slightly lower than model M1 but higher validation performances.The hydrological signatures simulated by M2 exhibited lower uncertainties both in validation and calibration periods because of better simulation of low flow conditions and groundwater dynamics.Model M3 provided similar performances as M2 for calibration and for validation but with clearly reduced uncertainty bounds.Overall signatures reproduction was improved because of clear improvement of low flow and groundwater related signatures even if performance in calibration objective functions remained lower that for model M1.Model M4 exhibited similar performances than the previous models both in calibration and validation periods but better performance for the whole set of signatures and lower uncertainties.
More details on the model calibration and evaluation with respect to hydrological signatures can be found in Hrachowitz et al., (2014;note that M1, M2, M3 and M4 presented in this study correspond respectively to M1, M6, M8 and M11 in the original paper).Within the obtained range of parameter uncertainty, the types of simulated hysteresis patterns were not affected by the parameter values but only by the model structures.Note that we restricted the following analysis only to the optimal parameter set in each case, first for the sake of clarity and also because at this stage, our interest was on assessing the ability of model structures to reproduce the observed general features in hysteresis patterns and not on quantifying their performances in fitting the observations.

Observations in hillslope and riparian zones
The 2-dimensional observed relationship between saturated storage in the hillslope (HSS) or in the riparian zone (RSS) and stream discharge (Q) for each year was hysteretic, highlighting the non-uniqueness of the response of discharge to storage depending on the initial conditions and a lag time between both variables dynamics in particular during the recharge period as illustrated in Fig. 4 for two contrasted water years.
The direction of the hysteretic loop was different depending on the topographic position of the piezometer: loops were always anticlockwise (leading to negative values of HI) for the piezometer located at the top of the hillslope HSS-F5b(Q), mostly anticlockwise for the mid-slope piezometer HSS-F4(Q) and mostly clockwise (positive values of HI) for the piezometer in the riparian zone RSS-F1b(Q) (Fig. 5).
In the riparian zone, storage at Q mid was usually lower in the recession period than in the recharge period, especially in dry years, leading to positive HI.This is due to the fact that riparian groundwater level increased early at the beginning of the recharge period, before the stream discharge, due to the limited storage capacity of the narrow unsaturated layer in bottom lands reinforced by groundwater ridging which is linked to the extent of the capillary fringe.However, the hysteretic loops were narrow and for wet years, the storage value during the recession period occasionally exceeded the value in the recession period without modifying the general direction of the hysteresis when looking at the whole pattern (e.g. in [2003][2004], see Fig. 4a).When this occurred at the time of Q mid , it led to negative HI although absolute values remained small (Fig. 5) The hillslope groundwater responded later than the stream, due to the deeper groundwater levels and higher unsaturated storage capacity (Rouxel et al., 2011), both introducing a time lag for the recharge and thus for the groundwater response.This led to negative values of HI as groundwater levels in recession periods were higher than in recharge periods for the same level of discharge (in particular at Q mid ).The loops were also wider in the hillslope, leading to high absolute values of HI (Fig. 5).
The intermediate behavior of the mid-slope piezometer (F4), exhibiting varying patterns throughout the years, reflects the fact the riparian zone extends spatially towards the hillslope and reaches a larger spatial extension during wet years.
Similar observations have been reported by other authors.For example, anticlockwise hysteresis between groundwater tables and discharge are observed by Gabrielli et al. (2012) in the Maimai catchment, while studies on riparian groundwater or river bank groundwater report clockwise hysteresis at the storm event scale (Frei et al., 2010;Jung et al., 2004).Similar pattern were also observed by Jung et al. (2004) who found that in the inner floodplain and in river bank piezometers, the hysteresis curve between water table and river stage exhibit a synchronous response, while in the hillslope hysteresis curves are relatively open, as the water table is higher during the recession than during the rising limb.

Observations in hillslope
Figure 6 shows the 3-dimensional relationship between hillslope saturated storage (HSS), unsaturated storage (HUS), and stream flow (Q) for the year 2010-2011.Four main periods can be identified, similar to what was outlined in recent studies (e.g.Heidbuechel et al., 2012;Hrachowitz et al., 2013a): three are characterizing the recharge period and the last one the recession period.First, stream flow was almost exclusively sustained by drainage of the saturated storage close or equal to zero, while the unsaturated zone exhibited a significant storage deficit and only minor fluctuations due to transpiration and small summer rain events (dry period).As more steady precipitation patterns set in, here typically around November, the unsaturated zone storage relatively quickly reached its maximal value, rapidly establishing connectivity of fast responding flow pathways (wetting up period).This led to a relatively rapid increase in stream flow while the saturated storage did not change much until the end of this period as incoming precipitation first had to fill the storage deficit in the unsaturated zone before significant increase in percolation could occur.A further lag was introduced by the time taken for water to percolate and eventually recharge the relatively deep groundwater.As soon as conditions were wet enough to allow for established percolation, the saturated storage eventually also responded, increasing faster than the stream flow (wet period) while unsaturated storage remained full.During the wet period (or high flow period), no pattern appeared clearly because all storage elements were almost full and the response of all the compartments were more directly linked to the short term dynamics of rain events.Finally during the recession period (drying period), unsaturated storage decreased comparatively quickly by drainage and transpiration while the saturated storage may keep increasing for a while by continued percolation from the unsaturated zone before decreasing through groundwater drainage at a relatively slow rate.A similar pattern was also observed for 2011-2012 (not shown).
The unsaturated zone storage followed a clockwise hysteresis loop with the stream flow and with the saturated zone storage.The hysteresis indices (Fig. 5, years 2010-2011 and 2011-2012) confirmed these directions, and showed that the hysteresis loops were narrower for unsaturated storage than for saturated storage, inducing smaller absolute values of the hysteresis indices due to the small size of the unsaturated storage compartment compared to the saturated storage compartment.

Interpretation
There are 3 main hypotheses generally proposed to interpret storage-discharge hystereses in hydrology.The first one is related to the increase of transmissivity with the groundwater level due to the frequently observed exponential decrease of hydraulic conductivity with depth.However, this would lead to systematic clockwise hysteresis loops and cannot explain the anticlockwise patterns observed between hillslope saturated storage and stream flow.The second hypothesis proposed by Spence et al. (2010) is that during the recharge period, the groundwater storage is not 5677 only increasing locally (as measured by the piezometric variations) but also the spatial extension of connected storage increases gradually, while during the recession period, the storage is decreasing homogenously across the entire contribution area.This is likely for riparian groundwater and could explain the clockwise hysteresis observed on this piezometer but cannot explain the anticlockwise hysteresis observed in the hillslope groundwater.The third hypothesis is that dominant hydrological processes are different between recharge and recession periods.For instance, Jung et al. (2004) interpret their clockwise hysteresis in peatlands groundwater as the results of a stepwise filling process during the rising flows (fill and spill mechanism) opposed to a more gradual drainage of the groundwater during the recession combined with the first hypothesis result, similar to what was found by Hrachowitz et al. (2013a).This hypothesis of different hydrological pathways allows an adequate interpretation of the opposite directions of the observed hystereses.Recharge period is characterized by a quick filling of the unsaturated and saturated storages in the riparian zone which is always close to the saturation while the saturated storage on the hillslope is not yet filling up (wetting up period).Thus, the wetting up period is characterized by an increase of stream flow, here mainly generated in the riparian zone, and eventual quick flows in the hillslope while hillslope unsaturated zone is reaching the storage capacity volume.At the beginning of the wet period, hillslope saturated storage is filling and starts to contribute to the stream along with riparian and fast flows.During the recession period (drying period), hillslope saturated zone is the only compartment which continues to sustain stream flow.If so there are three contributions to stream flow in the wet period while during the recession period, hillslope groundwater remains the only contributor to stream flow (cf.Hrachowitz et al., 2013a).This can explain the difference between storages values between recharge and recession periods.Finally, the hysteretic hydrological signature is not only related to the amount of stored water in the catchment but rather to where it is stored.These results are consistent with previous studies: the distinction between riparian groundwater and hillslope groundwater components has also been identified in similar catchments by (Molénat et al., 2008) based on nitrate concentrations analysis and by Aubert et al. (2013a) based on a range of solutes, and in other site as by (Haught and van Meerveld, 2011) using such Q-S relationships and lag time analysis.

Sensitivity of HI to initial conditions
Sensitivity to antecedent soil moisture conditions are often cited as an explanation for observed storage-discharge hysteresis and its variability between years.Initial levels of each store will obviously influence the time required to fill them and consequently the duration of the successive periods identified in the whole recharge period.As only 2 years of data were available, it was not possible to define a relationship between the initial average soil moisture and the magnitude of the hysteresis indices.However, the magnitude of HI was lower for high initial values of average unsaturated zone storage for both the saturated and unsaturated zones (Table 4).The HI for midslope saturated zone (F4b) seemed to be more sensitive too these initial moisture conditions than HI for upslope saturated zone and unsaturated zone.Similarly, the width of the loop (absolute value of HI) was little sensitive to initial groundwater levels in the hillslope: although the larger absolute values of HI were observed for the lower initial water table levels, no clear correlation was observed (Fig. 7).

Sensitivity of HI to annual rainfall
The saturated zone, HI values were negatively correlated with the total annual rainfall for both the hillslope and the riparian zone, with a more negative slope for the hillslope (Fig. 8 hillslope where hytereses are anticlockwise and smaller absolute value of HI for the riparian zone where hystereses are clockwise).In the riparian zone, when rainfall and maximal drainage reached very high value, it could lead to saturated storage value at the time of mid-point discharge in the recession period larger than the corresponding value during the recharge period, explaining the inversion of the sign of HI for RSS(Q) in very wet years.

Hysteresis simulations
For all years, all models (M1-M4) exhibited a hysteretic relationship between stream flow and the storages, as shown in Fig. 9 for the years 2003-2004 and 2007-2008, pertaining to the calibration and validation periods respectively.This means that all tested models introduced a lag time between catchment stores and the stream dynamics.The Fig. 10 presents the observed and modelled average and standard deviation of annual hysteresis Indices, for Hillslope saturated storage vs. discharge HSS(Q), Hillslope unsaturated storage vs. discharge HUS(Q), Hillslope unsaturated storage vs. Hillslope saturated storage HUS(HSS), and Riparian saturated storage vs. discharge RSS(Q).As Riparian saturated storage (RSS) is not modelled in M1 and M2, simulated RSS(Q) was available only for M3 and M4.For M1, the shape of the simulated hysteresis showed an overestimation of hillslope saturated storage (HSS) and of flow during dry years (e.g. the year 2007-2008 shown in Fig. 9).This was expected as we have seen that the model was unable to reproduce groundwater dynamics and the low signatures during the validation period (Fig. 3a and c).Simulated HI values were close to the observed ones for HSS(Q) (Fig. 10).The simulated hysteresis indices were small and negative for HUS(Q) while hillslope saturated storage was partially compensated by the underestimation of the hillslope unsaturated storage.This reveals the poor consistency of the model and explains why it was able to reach good performance in the calibration period but not in the validation period (Fig. 3).
For the model M2, the shape of the hysteresis loops showed a considerable underestimation of HSS and a large underestimation of stream flow in wet years (Fig. 9).Compared to M1, although the introduction of deep losses in M2 led to higher validation performances and better simulation of hydrological signatures (Fig. 3), the simulated HI (Fig. 10) were worsened, suggesting a poorer model consistency with respect to internal hydrologic processes.
For both models M3 and M4, the introduction of a riparian compartment improved the simulated hysteretic loops, due to a better simulation of stream flow in wet years, but HSS was still largely underestimated (Fig. 9).The mean HI values for HSS(Q) were close to the observed one, but the range of variation were smaller, indicating a reduced sensitivity to climate (Fig. 8).The mean values for HUS(Q) were clearly improved compared to M1 and M2, as the direction of the loop were clockwise as for the observed ones, although still underestimated.The mean HI values for HUS(HSS) were also greatly improved.The shape of the simulated hysteresis loops between riparian saturated storage (RSS) and stream flow (Q) showed a large underestimation of RSS especially during the recession period (Fig. 9c and d).This led to simulated HI for RSS(Q) which are positive, like the observed ones, but also largely overestimated (Fig. 10).Overall, these results suggest that for models including a riparian component, the underestimation of the hysteresis between HUS and Q was compensated by an overestimation of the hysteresis between RSS and Q.This highlights that despite a significant improvement in performances and improved hydrological signature reproduction, these models still involve a certain degree of inconsistency with respect to internal processes.However, M4 provided the most balanced performance considering hysteretic signatures between all storage components and strongly underlines the limitations of overly simplistic model architectures (e.g.M1) and the need for more complete representations of process heterogeneity.

Sensitivity of modelled hysteresis indices to annual rainfall
All models were able to represent the decrease of the hysteresis indices with annual rainfall in the hillslope, the slope of the correlation getting closer to the observed one from M1 to M4 (Fig. 8).The introduction of deep groundwater losses (M2) led to smaller saturated storage during recharge periods and increased the difference between saturated storage during recharge and recession periods at the time of midpoint discharge.However, as all models tended to overestimates low stream flow values, the slopes of the correlations between annual rainfall and simulated HI were smaller than for the observed one.
In the riparian zone, the modelled trends were the inverse of the observed one.The modelled recessions were always very sharp (see Hrachowitz et al., 2014), and the simulated riparian storage dried up every year, explaining that saturated storage at the time of mid-point discharge during the recession periods were much greater than during the discharge periods.This led to a general overestimation of HI values which were even stronger for wet years.This overestimation may be related to an improper conceptualization of the riparian zone functioning, which is never connected to hillslope reservoir in the tested models.In reality, during high flow periods, observed hydraulic gradient increased along the hillslope inducing a connection between riparian and hillslope reservoirs which are disconnected during low flow periods.

Value of such internal signatures for model evaluation
The use of hydrological hysteretic signatures in models assessments led to conclusions that were consistent with the classical hydrological signatures used in Hrachowitz et al. (2014).However, model M2 was less able to reproduce the different hysteretic signatures whereas it led to a real improvement regarding to the classical signatures in low flows.Considering only the distance between observed and simulated hysteresis indices on hillslope saturated storage and stream flow would lead to select model M1.This highlights the fact that using saturated storage dynamics alone can be deceptive for understanding the system response behaviour and that it is thus crucial to also consider the hysteretic signatures of unsaturated and riparian zones in a combined approach to develop a more robust understanding of the system.Here, hysteretic signatures of the unsaturated and riparian zones provided valuable additional assessment metrics regarding the performance of models M3 and M4 to represent the riparian zone.It was possible to identify when the model failed to represent processes and which processes are mostly compensating for missing ones and therefore why the model may provide some good performance for wrong reasons.To do so, the hysteresis index proved to be a useful proxy of hystereses themselves as it exhibited contrasted patterns sensitive to climate and localization within the catchment.

Perspectives: toward integrated hydrological-signatures-based modelling?
A general issue in model calibration is that because of over-parameterization of hydrological models and because the objective functions integrate generally only one variable, like the stream flow, automatic calibration techniques may lead to parameter sets which compensate for internal models errors.These parameters sets are mathematically correct but wrong in a hydrological point of view.The subsequent model is then to be considered non behavioural (Beven, 2006).For instance, if storage properties are not well taken into account by the model, this is likely to lead to a wrong simulation of storage dynamics in response to precipitation and thus the parameterization using traditional objective functions can lead to compensation of these errors in order to simulate a discharge value close to observed one while the storage is wrong.In such case, a model able to represent the internal catchment behaviour will generate a wrong discharge value but consistent with the storage value and will be rejected in traditional calibration procedures.To handle this issue and in order to select behavioural models, one can use multiple objectives objective functions 5683  (Gupta et al., 1998;Seibert and McDonnell, 2002;Freer et al., 2003), including a range of hydrological signatures to be reproduced or additional realisms constraints (Kavetski and Fenicia, 2011;Yadav et al., 2007;Yilmaz et al., 2008;Euser et al., 2013;Gharari et al., 2013;Hrachowitz et al., 2014).We argue that rather than increasing the number of constraints or objective functions to satisfy, an alternative could be to use some objective functions that are able to reflect the relationship between different variables using a combination of different variables as stream flow and the groundwater level, soil moisture, or stream concentrations.Such combined objective function would be more constraining for model selection.Therefore, the present study is a first step which aims at highlighting the still underexploited potential of hydrological hysteresis.The next step would be to quantify these relationships though functions or several indices usable in calibration criteria such as the Hysteresis Index proposed in this study.Moreover, such criteria could be used in classification studies.Indeed, some studies on the literature present storage/discharge relationships for different catchments that show patterns that are similar or not to the ones we observed in the Kerrien catchment (Ali et al., 2011;Gabrielli et al., 2012).This signature may help to classify catchments in terms of dominant processes driving their behaviour.
A remaining difficulty to integrate storage in calibration or evaluation procedure in hydrological modelling is how to measure this storage.McNamara et al. (2011) and Tetzlaff et al. (2011) proposed to use all available data from groundwater level monitoring, soil moisture records, water budget, modelling results, and so on, to estimate the storage in catchments.In this study, we used a quite dense network of piezometers and soils moisture measurements relatively to the small size of the catchment.Promising ways to estimate spatial quantification of storage in catchments include remote sensing of soil moisture (Sreelash et al., 2013;Vereecken et al., 2008), gravimetric techniques (Creutzfeldt et al., 2012), geodesy and geophysical methods.The interest of such techniques would be to provide a spatially integrated vision of the catchment water content.As for the different hydrological variables, the combination of hydrological and chemical variables appears relevant to investigate the hydrochemical behaviour of catchment.Hysteresis patterns between concentration and discharge have been largely documented for storm events characterization (Evans and Davies, 1998;Evans et al., 1999;Taghavi et al., 2011).Some studies are also reporting similar patterns at the annual scale (e.g.Aubert et al., 2013b).Such hysteretic relationships have been observed also between water and chemistry in groundwater (Rouxel et al., 2011;Hrachowitz et al., 2013a) emphasizing a disconnection between water and solutes dynamics that simple diffusion or partial mixing processes cannot explain.Stream water chemistry exhibits also particular seasonal cycles with different phasing with discharge depending on the solutes (Aubert et al., 2013a).This provides extra information on the water pathways within the catchment.These relationships appears also powerful to constraint hydrochemical modelling.

Conclusion
A method to characterize and quantify partially the relationship between storages in a headwater catchment and stream flow along the year has been proposed.It allowed us to then assess the ability of a range of conceptual lumped models to reproduce this catchment internal signature.Catchment storage has been approximated using a network of piezometric data and several unsaturated zone moisture profiles to consider the storage in the saturated as well as in the unsaturated zones.
The observations showed that storage/discharge relationships in catchments can be hysteretic highlighting a successive activation of different hydrological components during the recharge period while the recession exhibits a fast decrease of unsaturated and riparian storages and a slow decrease of hillslope saturated storage which sustains the stream flow.Four periods have been identified along the hydrological year.Riparian and Hillslope saturated storages exhibited different patterns with opposite directions of the hysteretic loops.The tested models were characterized by an increasing degree of complexity.All of them simulated a hysteretic relationship between storages and discharge, but the ability to reproduce each storage dynamics relatively to the others increased with the model complexity.They were previously calibrated using classical objective functions and assessed using classical hydrological signatures, and their overall performance at reproducing hysteretic signatures was consistent with their overall performance at reproducing the classical signatures.The analysis of the simulated hysteresis signatures helps to identify why the model may give a right answer for wrong reasons and may be used as a descriptor of the internal catchment functioning.5707

5665
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

5667
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 2 Material and method

5669
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Discussion
Paper | Discussion Paper | Discussion Paper | Discussion Paper | respectively for stream flow Q, for the logarithm of the stream flow 5672 the Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | log(Q), and for the flow duration curve (FDC) based on log(Q) defined as follows:

Discussion
Paper | Discussion Paper | Discussion Paper | Discussion Paper |

5675
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Discussion
Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | , Data).Wet years (i.e.large values of annual rainfall) are generally associated with large values of annual maximal and mid-point stream flows and also to large values of groundwater table level, leading to larger saturated storage values during the recession period, while the storage values during the recharge period do not change much from year to year.Thus, larger storage values at the time of mid-point discharge in the recession period led to smaller values of HI (i.e larger absolute values for the 5679 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | the observed values were large and positive.Simulated HI values for HUS(HSS) were also overestimated.These results show that in model M1, the overestimation of the 5680 For Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

5681
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Discussion
Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

5685
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Figure 2 .Figure 3 . 8 Figure 3 .Figure 4 .Figure 4 .Figure 5 .Figure 5 .
Figure 2. Normalized (a) groundwater levels for piezometers in the hillslope (F4 and F5b) and in the riparian zone (F1b) and (b) average, maximum and minimum unsaturated zone storages for all the sensors on the two profiles, on the Kerrien catchment.

Figure 6 .Figure 7 .Figure 8 .-Figure 7 .Figure 8 .
Figure 6.Evolution of stream flow (Q in mm d −1 ) and normalized Hillslope Unsaturated Storage (HUS) and Hillslope Saturated Strorage (HSS) for the water year 2010-2011 (October to September).The size of the dots is increasing with time.Unsaturated storage (HUS) is computed from the moisture sensors in profiles sB1 and sB2), saturated storage (HSS) is represented using normalized groundwater table level (computed from 2 piezometers in the hillslope).(a) is the 3-D plot and (b-d) are the respective 2-D projections of (a) on the three plans.

Figure 7 .Figure 8 .Figure 8 .Figure 8 .
Figure 7. Year to year variations, for the 10 monitoring years, of the hysteresis indices (a) HSS-F5b(Q) and HSS-F4(Q) (HI) vs. the initial groundwater table level depth in the corresponding hillslope piezometer (F5b or F4) and (b) HSS-F1b(Q) vs. the initial groundwater table level depth in the piezometer in the riparian area (F1b).

Figure 9 .Figure 9 .Figure 10 .Figure 10 .
Figure 9. Observed and simulated annual hysteresis between stream flow (Q) and (a, b) Saturated 1 Storage in the hillslope HSS (for observed, HSS is the average of F5b and F4) and (c, d) Saturated 2 Storage in the riparian area RSS (for simulated, only M3 and M4 represent the riparian area), for the 3 water years (a, c) 2003-2004 (wet year, calibration period) and (b, d) 2007-2008 (dry year, validation 4 period).5 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | using capacitive probes which provide volumic humidity based on Frequency Domain Reflectometry (EnvironScan SenteK).Due to technical problems, data are missing in December 2012 and January 2013, then only two complete water years were available

Table 1 .
Water balance, state and flux equations of the models used.