Comparison of artificial neural networks and reservoir models for simulating karst spring discharge on five test sites in the Alpine and Mediterranean regions

. Hydrological models are widely used to characterise, understand and manage hydrosystems. Data-driven models are of particular interest in karst environments given the complexity and heterogeneity of these systems. There is a multitude of data-driven modelling approaches, which can make it difficult for a manager or researcher to choose. We therefore conducted a comparison of two data-driven modelling approaches: artificial neural networks (ANN) and reservoir models. 20 We investigate five karst systems in the Mediterranean and Alpine regions with different characteristics in terms of climatic conditions, hydrogeological properties and data availability. We compare the results of ANN and reservoir modelling approaches using several performance criteria over different hydrological periods. The results show that both ANN and reservoir models can accurately simulate karst spring discharge, but also that they have different advantages and drawbacks: (i) ANN models are very flexible regarding the format and amount of input data, (ii) reservoir models can provide


Introduction
Karst systems are complex and heterogeneous media.High contrasts in porosity and permeability induce high variability in infiltration and internal flow processes (Bakalowicz, 2005;Ford and Williams, 2007), which can be difficult to assess.Considering the increasing demand for water and that around 9 % of the world's population (up to 90 % in some parts of the Mediterranean area) depends on karst water resources for drinking water supply (Stevanović, 2019), the characterization of karst system functioning and water availability become a major challenge for water resource management.Among the numerous methods to study karst systems (Goldscheider, 2015), hydrological models are useful to characterize karst functioning and specially to predict the impact of climate and land use changes (Hartmann et al., 2014).Hydrological models can be grouped into lumped parameter and distributed approaches (Kovács and Sauter, 2007).While distributed models divide a karst system into a two-or threedimensional grid, for which each cell is assigned appropriate hydraulic parameters and system states, lumped parameter models are based on the mathematical analysis of input data (e.g.precipitation, temperature) for simulating spring discharge time series.They include (i) "black-box" models such as neural-network-based approaches, which use no a priori information about the functioning of a system, and (ii) "conceptual" models, which are based on a conceptual representation of a karst system -e.g. for the reservoir models, a succession of one or several reservoirs using simplified physical transfer functions.
The choice of a modelling approach depends mainly on the objective of the study but also on the current knowledge of the system, the available data and regional/institutional preferences (Addor and Melsen, 2019).For karst systems, the available data are often scarce and poorly reflect the heterogeneity of the meteorological and karst processes.Distributed models require a lot of diverse data with high spatial and temporal resolution for defining physical parameters and thus can be tough to use in a scarce data context (Hartmann et al., 2014).On the other hand, lumped parameter models permit the study of complex and heterogeneous karst systems without requiring extensive meteorological and systemrelated data with high spatial resolution.Artificial neural networks (ANNs) have been successfully used to simulate karst spring discharge (Kurtulus and Razack, 2007;Hu et al., 2008;Meng et al., 2015;Wunsch et al., 2022), predict and forecast water flood/inrush (Wu et al., 2008;Kong-A-Siou et al., 2011) and manage the exploitation of karst aquifers (Yin et al., 2011;Kong-A-Siou et al., 2015).Reservoir models have also been successfully used to simulate karst spring discharge (Fleury et al., 2007;Dubois et al., 2020), manage the exploitation of karst aquifers (Fleury et al., 2009;Zhou et al., 2021) and characterize specific functioning in karst systems (Perrin et al., 2003;Jukić and Denić-Jukić, 2009;Tritz et al., 2011;Bittner et al., 2020).This approach is well suited to karst systems due to the high heterogeneity and low level of knowledge of their structure (Fleury et al., 2009;Hartmann et al., 2012).Although several authors compared the performance of different ANN models (Kurtulus and Razack, 2010;Kovačević et al., 2018;Cheng et al., 2020) and studied structure and parameter equifinality in reservoir models (Makropoulos et al., 2008;Gondwe et al., 2011;Hartmann et al., 2012;Mazzilli et al., 2012), only a few studies have been conducted on the comparison of both approaches in karst environments (Kong-A-Siou et al., 2014;Sezen et al., 2019;Jeannin et al., 2021).Kong-A-Siou et al. (2014) observed that ANN models are more effective at accounting for the non-linearity of karst systems during extreme events (dry and flood periods), while reservoir models were better at representing the hydrological functioning of the system during intermediate water periods.Sezen et al. (2019) observed that ANN models were better at simulating low-flow periods and reservoir models for simulating spring discharges on predominantly non-karst catchments.Jeannin et al. (2021) emphasized the great potential of ANN models but highlighted two main limitations: (i) they require long time series to accurately learn the functioning of a karst system, and (ii) usually no information about specific functioning of a system can be deduced from the results.
The performance of the ANN and reservoir models can therefore be influenced by the characteristics of the catchment and the format and length of the input data.The aim of the present study is to help researchers and stakeholders to choose between the ANN and reservoir modelling approaches to simulating karst spring discharge, depending on their purpose and the available data.This research provides the first extensive comparison of ANN and reservoir models in karst hydrology by investigating results on five study sites with different contexts and input data.We use ANNs as they have proven to be fast and reliable for modelling hydrological time series (Van et al., 2020;Jeannin et al., 2021;Wunsch et al., 2021).We specifically apply one-dimensional convolutional neural networks (CNNs) because in an earlier study (Wunsch et al., 2022) we were able to demonstrate their high ability to perform karst spring discharge modelling.Furthermore, they have some favourable properties compared to popular recurrent neural networks (e.g. the LSTMs), such as a batch-wise training procedure which makes them considerably faster and computationally less expensive.Reservoir modelling is carried out using the KarstMod platform, as it provides a powerful modular interface for varying the structure, parameters and transfer functions of the conceptual model (Mazzilli et al., 2019).This research seeks to address the following research questions.
-What are the advantages and drawbacks of the ANN and reservoir models in karst hydrogeology?
-To which extent can the ANN and reservoir models be used to get a better understanding of system functioning?
-What are the implications from a stakeholder's perspective?
2 Data and study sites We compare the ANN and reservoir modelling approaches using data from five different well-studied karst systems (Table 1, Fig. 1).All the systems have different characteristics in terms of hydrogeological properties (e.g.catchment area, karstification), data availability (e.g.length of the time series, number of meteorological stations, time step) and environmental conditions (e.g.climate, anthropogenic influence).Each study site is detailed in the following sub-sections, and further details about the meteorological data can be found in Table A1.

Aubach spring, Austria
Aubach spring (1080 m a.s.l.) is a large non-permanent spring located in the Hochifen-Gottesacker area, on the border between Germany and Austria (northern Alps).The Hochifen-Gottesacker system covers an area of about 35 km 2 and its altitude varies between 1000 and 2230 m a.s.l.(Chen et al., 2018).The area is under a cool, temperate and humid climate and is strongly affected by snow accumulation and melting, which typically occur between November and May (Chen et al., 2018).The spring is located in the Schwarzwasser Valley, which follows the geological contact between highly karstified limestone (Schrattenkalk formation) in the northern and western parts and impermeable sedimentary rocks of the Flysch zone in the southern part (Goldscheider, 2005).The main catchment of Aubach spring is estimated to be approximately 9 km 2 (Goldhttps://doi.org/10.5194/hess-27-1961-2023Hydrol.Earth Syst.Sci., 27, 1961Sci., 27, -1985Sci., 27, , 2023Sci., 27, scheider, 2005;;Chen and Goldscheider, 2014).The spring also receives inflow from several upstream karst catchments and the Flysch zone, where surface runoff can sink into an estavelle and pass through an underground karst conduit during low-flow periods, as demonstrated by multiple tracer tests (Goldscheider, 2005).Precipitation and temperature data were obtained from three meteorological stations located outside the catchment.The potential evapotranspiration is calculated using data from one station with the modified Turk-Ivanov approach after Wendling and Müller (1984), described in Conradt et al. (2013).

Gato Cave spring, Spain
Gato Cave spring (462 m a.s.l.) is one of the main outlets of the karst system of Sierra de Lìbar.It is located in the northwestern part of the province of Málaga, within the boundaries of the Grazalema Natural Park, about 75 km west of Málaga.The altitude of Sierra de Lìbar varies between 400 and 1400 m a.s.l.according to the main north-east-southwest mountain alignments.The area is under a Mediterranean climate with an average annual precipitation of about 1500 mm and is defined by a strong seasonal pattern (Andreo et al., 2006).The site is located within the External Zone of the Betic Cordillera and presents mainly Jurassic limestones and dolomites, Cretaceous-Paleogene marly limestones, and Tertiary clays and sandstones (Flysch) that cover the whole Mesozoic rock sequence.The Jurassic rocks outcrop as anticlinal cores, while the synclines and tectonic grabens are composed of Cretaceous rocks (Martín-Algarra, 1987).The Hundidero-Gato system constitutes a binary karst system where a wide range of well-developed karst landforms are found, such as karrenfields, swallow holes, and caves.These features strongly condition recharge, which is primarily produced in two ways: (i) autochthonous, by direct infiltration of rainfall through carbonate outcrops (20-40 km 2 ) and rainwater that infiltrates through swallow holes in poljes, and (ii) allochthonous, as a contribution from runoff produced in the Gaduares River basins (43.5 km 2 ).This runoff is stored in the Montejaque dam, which was built on karstified limestone, resulting in water losses in the reservoir and, consequently, the artificial recharge of the aquifer through the Hundidero cave (Andreo et al., 2004).
Precipitation and temperature data are from the meteorological station of Grazalema, which is the closest to the catchment and therefore the most representative.Potential evapotranspiration is calculated with the Hargreaves-Samani approach (Hargreaves and Samani, 1985).

Lez spring, France
Lez spring (64 m a.s.l.) is located 15 km north of Montpellier, and the altitude of its catchment varies between 64 and 655 m a.s.l.The Lez catchment is exposed to a Mediterranean climate, which is characterized by hot, dry summers, mild winters and wet autumns.As a large part of the hydrogeological basin is relatively impermeable due to the presence of marl and marly-limestone formation, the effective recharge area of Lez spring covers about 130 km 2 (Fleury et al., 2009) and corresponds to Jurassic limestone outcrops.Localized infiltration occurs through fractures and sinkholes along the basin and through the major geologic fault of Corconne-Les Matelles.The Lez aquifer is subject to anthropic pressure (i.e.exploitation for water supply) with pumping directly into the karstic conduit.The discharge is measured at the spring pool and is regularly zero during low-water periods, when the pumping rate exceeds the natural discharge of the spring.
Precipitation data are from four meteorological stations.Three are located in the catchment and one is located about 5 km west of the catchment.Potential evapotranspiration is calculated with the Oudin approach (Oudin et al., 2005).Temperature data are from the Prades-le-Lez meteorological station.

Qachqouch spring, Lebanon
Qachqouch spring (64 m a.s.l.) is located in the Nahr el-Kalb catchment and originates from a Jurassic karst aquifer.The recharge area is estimated to be about 56 km 2 with altitudes ranging from 60 to over 1500 m a.s.l.(Doummar and Aoun, 2018;Dubois et al., 2020).The catchment is primarily exposed to a Mediterranean climate, with snow influence at higher altitudes (Dubois et al., 2020).The lithology mainly consists of Jurassic karstified limestone and dolomitic limestone (on the higher plateaus), changing to more massive micritic limestone in the lower part of the catchment.The Qachqouch system is characterized by a duality of flow in a low-permeability matrix and a high-permeability conduit system (Dubois, 2017).Potential runoff inflows from higher altitudes and infiltrates downstream into the Jurassic karst aquifer.
Precipitation and temperature data are from two meteorological stations.One is located in the catchment at 950 m a.s.l.The other, with a heated rain gauge, is located 22 km north-east of the catchment at 1700 m a.s.l.(Doummar et al., 2018).Potential evapotranspiration is calculated using data from the 950 m station with the modified Penman-Monteith approach (Allen et al., 1998).

Unica springs, Slovenia
Unica springs (450 m a.s.l.) are the outlets of a complex karst system with an estimated recharge area of about 820 km 2 .The area is under a moderate continental climate and is strongly influenced by snow accumulation and melting.It is sub-divided into three sub-catchments, with a predominance of (i) allogenic infiltration from two sub-catchments drained by sinking rivers flowing through a chain of karst poljes and a river valley and (ii) autogenous infiltration through a karst plateau with highly karstified limestone (Gabrovšek et al., 2010;Kovačič, 2010;Petric, 2010).The poljes follow each other in a descending series at altitudes between 450 and 750 m a.s.l. and are connected in a common hydrological system.Characterized by a network of surface rivers and frequent flooding, this induces a very particular response at the Unica springs with very high hydrological variability (by several orders of magnitude) and delayed and prolonged high-flow values (Mayaud et al., 2019).Low-flow periods are sustained by flows from the karstified limestone aquifer, which reaches heights up to 1800 m a.s.l. and has significant groundwater storage (Ravbar et al., 2012).Part of the discharge is lost due to an underground bifurcation (Kogovšek et al., 1999).When the discharge exceeds about 60 m 3 s −1 and remains high for a few days, a polje downstream of the springs becomes flooded.When the discharge reaches about 80 m 3 s −1 , the flooding reaches the monitoring station, influencing the measurement.The water from the lake is drained by several ponors downstream of the monitoring station, but their absorption capacity is much lower than the discharges of the springs.
Precipitation, snow cover height and height of new snow data were obtained from two meteorological stations located in the catchment.Temperature and relative humidity data are from Postojna meteorological station only.Potential evapotranspiration is calculated using data from the Postojna station with the modified Penman-Monteith approach (Allen et al., 1998).

Artificial neural networks
ANNs are a branch of machine learning, i.e. a technique to learn complex relations from existing data.They imitate the basic functioning of biological nervous systems and similarly consist of mathematical representations of neurons structured and interconnected in layers.Given sufficient data from which to learn, ANNs can establish complex input-output relations with only limited domain knowledge.
In this study, CNNs (LeCun et al., 2015) -a specific model type from the ANN subfield of deep learning (DL)were used.CNNs are predominantly successful in processing image-alike data but are also useful in signal process-ing for sequential data.They usually consist of sequences or blocks of convolutional layers for feature recognition and pooling layers for information consolidation.In the former, filters of a specific size (defining their receptive field) are used to produce feature maps.These feature maps are subsequently downsampled (often by maximum selection) into pooling layers to consolidate the contained information.Several of these blocks with varying properties can be stacked on top of each other, also in combination with other layer types such as batch normalization layers (Ioffe and Szegedy, 2015) to prevent exploding gradients or dropout layers (Srivastava et al., 2014).Lastly, one (or multiple) fully connected dense layers follow to produce the model output.For the models in this study, we used a single one-dimensional convolutional layer with a fixed kernel size (three) and an optimized number of filters.This layer was succeeded by (i) a max-pooling layer, (ii) a Monte Carlo dropout layer (10 % dropout rate) and (iii) two dense layers, the first with an optimized number of neurons and the second with a single output neuron (Fig. 2).Besides a number of filters and number of neurons in the first dense layers, we optimized the training batch size and the length of the input sequence for each simulation step using the Bayesian Optimization library (Nogueira, 2014).The number of minimum and maximum optimization steps was individually selected for each site and can be found in the provided modelling scripts (Cinkus and Wunsch, 2022).To ensure proper learning, the models are regularized with several measures.Hence, early stopping with a patience of 20 steps is applied to prevent the model from overfitting.Except for Qachqouch, where few data are available, the size of the according stopset ranges between one and four annual cycles (see the provided scripts for details).This stopset is considered a part of the calibration period mentioned in Sect.3.4.Further, dropout ensures robust training and serves as another measure against overfitting.We applied the Adam optimizer for a maximum of 150 to 300 training epochs with an initial learning rate of 0.001 and applied gradient clipping to prevent exploding gradients.

Reservoir models
Reservoir models are a conceptual representation of a hydrosystem, which involves the association of several reservoirs that are thought to be representative of the main processes at stake.Each reservoir is characterized by its water height and a flow equation that translates the variations of water height into discharges.The flow equation is a function of a specific discharge coefficient and a positive exponent (different from 1 for non-linear flows), which are defined by calibration against observed data.
Many reservoir models have been developed to study the relation between precipitation and discharge in karst systems (Hartmann et al., 2014).They all differ in complexity with respect to the number of reservoirs and parameters, which need to be well thought out in order to preserve physical realism and limit equifinality on model parameters.Careful sensitivity analyses and uncertainty assessment should be considered along with model results to avoid over-interpretation (Refsgaard et al., 2007).Reservoir models can be seen as a compromise between simulation performance and insight into the functioning of a system.We used the adjustable modelling platform KarstMod to perform reservoir modelling.KarstMod provides a modular, user-friendly interface for simulating spring discharge at karst outlets.The structure of models built using the Karst-Mod platform is based on the conceptual model of a karst aquifer with infiltration and saturated zones (Mazzilli et al., 2019).The infiltration zone (soil and epikarst) drains water from the surface through a vertical network of fissures and conduits.Water storage can occur in the unsaturated zone and local saturation.The saturated zone comprises a dualporosity functioning, with a network of high-permeability fractures and conduits and a low-permeability matrix with a high storage capacity.
In KarstMod, the model structure can include up to four reservoirs.One at the upper level reflects the processes (infiltration, storage and drainage) occurring in the soil and epikarst zone.Three at the lower level can be connected with the first one and correspond to the infiltration and/or saturated zones.The discharge can be simulated with (i) several linear and non-linear water-level-discharge laws, (ii) a hys-teretic water-level-discharge function to reproduce the hysteretic functioning observed in the wet-dry cycles in the unsaturated zone (Lehmann et al., 1998;Tritz et al., 2011), and (iii) an exchange function that aims to reproduce the interactions between the matrix and conduits.More details on the balance equations, the parameters involved and the KarstMod platform in general can be found in Mazzilli et al. (2019) or in the KarstMod User Guide (Mazzilli and Bertin, 2019).
In this study, we first addressed the structure of the models, taking into account our expert knowledge and previous studies.For each site, we examined the major characteristics that determine the functioning of the system and associated the corresponding conceptual modelling.We then modified this base structure according to the performance of the model while trying to maintain physical realism.The most efficient model structures that we obtained after performing the modelling are shown in Fig. 3.
The Aubach spring selected model (Fig. 3a) is close to the conceptual model, with a very reactive transfer function (Q ES ), corresponding to the well-developed conduit network, and a matrix reservoir (M), which in this case mostly reflects the storage properties in the unsaturated limestone.We tested different configurations (lost discharge from upper-level reservoirs and/or pumping in lower reservoirs) to simulate the lost discharges through overflow springs and un- derground flows, but there were no significant increases in model performance.The Gato Cave spring selected model (Fig. 3b) is different from the conceptual model as the platform could not account for the allochthonous recharge in the catchment.The model structure includes a soil available water capacity (E min ), matrix and conduit compartments (M and C) and matrix-conduit exchanges (Q MC ), which may translate the processes occurring through the dam.The Lez spring selected model (Fig. 3c) is accurate with the conceptual model and includes an overflow transfer function (Q loss ), matrix and conduit compartments (M and C), matrix-conduit exchanges (Q MC ) and pumping into the main conduit (Q pump ).We considered a low soil available water capacity (E min ) as it greatly increased the performance of the model.The Qachqouch spring selected model (Fig. 3d) is consistent with previous conceptual models that considered many different response times.The model structure features a very reactive transfer function (Q ESO ), matrix and conduit compartments (M and C), matrix-conduit exchanges (Q MC ) and a soil available water capacity.The multiple different transfer functions help to reproduce the reactive and dampened responses of the Qachqouch karst aquifer.The Unica springs selected model (Fig. 3e) is significantly simpler than the conceptual model, which includes polje flooding, allochthonous recharge, overflow springs and matrix-conduit exchanges.We only retained a very simple structure as it was the best trade-off between physical realism and model performance.The very reactive transfer function Q ESO allows reproduction of fast flows through conduits, while the matrix reservoir (M) likely translates processes occurring in the matrix and surface flooding.

Input data
Input data are the time series that are used for simulating karst spring discharge.They can be derived from either a direct observation (e.g.observed discharge, temperature, sinking stream discharge or pumping) or a calculation from raw input data (e.g.potential evapotranspiration derived from temperature).The nature of input data usually differs between the ANN and reservoir modelling approaches, as ANN models tend to make good use of direct observations, whereas reservoir models often require one to pre-process the raw input data.We decided to work with raw input data to ensure equitable performance between the ANN and reservoir models.The raw input data were either used directly or pre-processed, depending on the modelling approach.
The data used for each modelling approach and site are summarized in Table 2. Observed discharge time series were used directly (without further pre-processing) in the ANN and reservoir models.In the case of Lez spring, the models were simultaneously calibrated on the spring discharge (Q) as well as on the water level in the aquifer (Z).Furthermore, the pumped discharge time series in reservoir C (Q pump , Fig. 3c) was used as an input.Precipitation time series were used differently as there are often several meteorological stations per study site.For ANN models, precipitation time series were used as raw input P raw , except for Lez spring, where the individual raw precipitation data had too many missing values, so we used the same input as the reservoir model (P in ).In the case of Aubach, Qachqouch and Unica, P raw includes all the precipitation time series from the different meteorological stations (Table A1).For reservoir models, the precipitation time series were either (i) used directly if there was no snow dynamics in the catchment and only one meteorological station was available (Gato Cave), (ii) pre-processed with Thiessen's polygon interpolation (Appendix B) if there were several meteorological stations (Lez), (iii) pre-processed with a snow routine (Appendix C) to simulate snow accumulation and melting over the catchment (Aubach) if snow dynamics could not be neglected or (iv) pre-processed with both Thiessen's polygon interpolation and a snow routine (Qachqouch, Unica).For reservoir models, evapotranspiration processes were considered using time series of potential evapotranspiration, which were calculated for each site using different methods depending on the available meteorological data, the climate of https://doi.org/10.5194/hess-27-1961-2023 Hydrol.Earth Syst.Sci., 27,2023  We handled missing values in the different time series as follows: (i) temperature gaps were linearly interpolated, (ii) precipitation and evapotranspiration gaps were considered to be equal to 0 and (iii) discharge gaps were interpolated with a Lagrange polynomial function.Maximum observed gaps for precipitation, temperature, discharge and evapotranspiration time series are detailed in Table 2.Note that, (i) for Lez spring, we observed maximum gaps of 17 and 16 d for pumped discharge and piezometric level, respectively, and (ii) for Unica springs, there are no missing values in the Cerknica new snow height (NS) time series.

Model calibration and simulation
The calibration period is the period used for selecting the parameters that provide the best results according to the performance measure.The validation period is intended to assess the relevance of the parameters over a time interval that is not used for calibration.In the domain of the ANN modelling, the validation is usually denoted as a testing period.However, we unify the terminology at this point.The calibration period is again split into three different parts in the case of ANN modelling, (i) to train the model, (ii) to prevent the model from overfitting, and (iii) to optimize its hyperparameters.We defined the same calibration and validation periods for both modelling approaches (Table 3), which ensures fair initial conditions and a meaningful comparison of the results.We have chosen the periods in a way to maximize the length of the calibration periods, which allows for relevant model results (especially in ANN models).In the reservoir model, we considered a short warm-up interval at the beginning of the calibration period for the model to adjust and reach an optimal state.
We calibrated the models by applying the mean squared error (MSE) to simulated and observed discharge time series.For Lez spring, we used a composite function of discharge and water level in order to consider both variables in the same modelling process.
Multiple simulations were carried out for each modelling approach at each site.The obtained simulated discharge (or water level) time series corresponds to the mean of the distribution of simulated values at each time step.We also considered the uncertainties in the model prediction by calculating the 90 % confidence interval, whose limits correspond to the 0.05 and 0.95 quantiles of the distribution at each time step.
In KarstMod (reservoir models), the retained simulations correspond to all the results satisfying a maximum MSE threshold on the calibration period for a 6 h model run.The confidence interval reflects the uncertainty in the parameters used in the model, which are not fixed but are defined as a range (e.g.catchment area = 150 to 200 km 2 ).In the case of ANN models, we used a model ensemble of 10 models based on different random number generator seeds for model initialization.Using the Monte Carlo dropout layer, for each of the ensemble members a total of 100 simulation results were generated.

Model evaluation
We evaluated the performance of the models using the MSE and several performance criteria that assess different aspects of the discharge: modified Kling-Gupta efficiency (KGE ), KGE components (r, γ , β) (Kling et al., 2012) and diagnostic efficiency (DE) (Schwemmle et al., 2021).These criteria were all applied to the whole discharge regime but also to sub-regimes of high-and low-flow conditions (with the exception of DE, which already takes sub-regimes into account).For Lez spring, we also applied the MSE and KGE criteria to water level.Model performance is usually evaluated on both calibration and validation periods for reservoir models.However, this approach is not suited to ANN models, for which the calibration period corresponds to the learning period of the model.Thus, we chose to only evaluate and compare the reservoir and ANN models on their validation periods.
The KGE has gained in popularity as it aims to address some limitations of the Nash-Sutcliffe efficiency (Nash and Sutcliffe, 1970), i.e. (i) the discharge variability is underestimated, (ii) the mean of observed values is not a meaningful benchmark for variables with high variability, and (iii) the normalized bias is dependent on the variability (Gupta et al., 2009, Willmott et al., 2012).The KGE is based on the KGE and aims to ensure that bias and variability are not crosscorrelated by using the coefficient-of-variation ratio (γ ) instead of the standard deviation ratio (α): with r the Pearson correlation coefficient between the simulated and observed discharge, β the ratio between mean simulated and mean observed discharge, and γ the ratio between simulated and observed coefficients of variation of discharge.
The three components of KGE help to evaluate different aspects of a model: (i) r is related to shape and timing (Santos  A KGE score equal to 1 means a perfect match between simulated and observed discharge, while a score lower than −0.41 indicates that the model is worse than using the observed mean as a predictor (Knoben et al., 2019).
The DE criterion is intended to help define the weaknesses of a model.It is based on constant, dynamic and timing errors.DE is proposed as a numerical measure (ranging from 0 to ∞, with 0 indicating a perfect model) but can also be visualized on a polar plot that effectively differentiates error contributions.The overall error is calculated with the following equation: with B rel and |B area | the measures for constant and dynamic errors, respectively.As these measures are based on the flow duration curve, they give information in terms of exceedance probability.Details of their calculation can be found in Schwemmle et al. (2021).
The performance criteria applied to high-and low-flow conditions are denoted by the lower script indices "L" and "H", respectively.These criteria allow the performance of the models to be evaluated over different flow regimes (i.e.dry/intermediate, wet).Discharge thresholds were set manually based on our knowledge of the system and a careful assessment of the distribution of discharge values.They are equal to 1, 2, 0.8, 5 and 20 m 3 s −1 for Aubach, Gato Cave, Lez, Qachqouch and Unica springs, respectively.

Results and discussion
The obtained models and their confidence intervals for the two approaches and each test site are presented in Fig. 4 for discharge and Fig. 5 for water level (Lez spring).Their performance -assessed with multiple criteria -are shown in Fig. 6, Table 4 and Table D1.The DE polar plots for each site are presented in Fig. 7.

Aubach spring
The ANN model is very good, with a KGE of 0.88 (Table 4).The snow-influenced period from April to mid-June is accurately modelled, as are the peaks in summer and early autumn (Fig. 4a).The highest peaks of the whole time series occurring in February, July and November are only slightly underestimated.The model is balanced and accurate in volume (β = 0.93), variability (γ = 1.01) and shape and timing (r = 0.91).The model is very good for simulating high flows and is decent in low flows but could be improved, especially in shape and timing (r H = 0.84, r L = 0.66).The slightly higher value of γ L (1.26) may be related to the tendency of the model to "oscillate" during low/medium flows (e.g. in September, Fig. E1a).This wave-like behaviour may be related to a high sensitivity to precipitation events or to inappropriate learning from other data.DE is very good (0.14, Fig. 7a).The model shows negative dynamic and constant errors with a higher share of high flows, which points to a small underestimation of the occurrence of high flows.
The reservoir model is decent, with a KGE of 0.69 (Table 4), but the model fails to accurately reproduce the discharges in all the seasons.There is a deficit in water during winter/early spring and an excess during spring (Fig. 4a).The model is balanced and accurate in variability (γ = 0.91) Figure 6.Performance of the ANNs and reservoir models in the validation period according to different performance criteria.Exact values can be found in Table 4.
Table 4. Details of indicator values for the reservoir and ANN models in the validation period.For each site, the simulations are evaluated with different performance criteria on total, high-flow and low-flow conditions.Values in bold correspond to the better score between the ANN and reservoir models.

Spring
Flow  The simulated high flows are decent, although they can be improved in shape and timing (r H = 0.68) and volume (β H = 0.70).DE is good (0.26, Fig. 7a).The model has negative dynamic and constant errors, with a higher share of high flows.These errors can be either due to (i) a miscalibration of the snow routine, retaining too much water as snow in winter and thus releasing too much in warmer periods, (ii) the uncertainties related to the meteorological data in mountainous catchments or (iii) the snow dynamics which cannot be taken into account within the KarstMod platform, e.g. by adding a snow storage above the epikarst (Chen et al., 2018).
In October, a series of peaks is not well captured by the outputs of both models (Fig. 4a).A plausible explanation is that the inputs do not capture the respective local precipitation events due to the locations of the climate stations outside the catchment.
The modelling of discharges from Aubach spring is challenging due to the large elevation differences and the heterogeneity of precipitation on the catchment.This makes it difficult to provide accurate data to the model, especially with regard to snow dynamics.The reservoir model is particularly confronted with these aspects because (i) it can only handle a single precipitation input (from one weather station or interpolated from several stations) and (ii) the snow dynamics must be simulated by a snow module.As these pre-processings cannot really catch the spatial heterogeneity of complex snow processes, they strongly limit the model performance (Çallıet al., 2022).Leaving aside the mismatches related to inadequate meteorological inputs, the structure of the reservoir model seems appropriate for simulating the hydrological response of the spring.In contrast, the ANN model is able to consider snow dynamics without any preprocessing, using only the precipitation and temperature time series during calibration.It shows great versatility with respect to the input data, similar to that of a two-dimensional approach.

Gato Cave spring
The ANN model is very good, with a KGE of 0.91 (Table 4), but the model struggles to reproduce the discharges during flood events (Fig. 4b).Very high peaks are either overestimated (e.g.May 2012, April 2013, March 2014) or underestimated (e.g. December 2011, November 2012, March 2013).The model is balanced and accurate in volume (β = 0.98), variability (γ = 0.97) and shape and timing (r = 0.92).The model is good for simulating high flows and is somewhat decent in low flows.It shows a slight lack in shape and timing in both high and low flows (r H = 0.82, r L = 0.82) and also seems to overestimate low flows (β L = 1.32).In the same way as the Aubach ANN model, the slightly high variability (γ L = 1.19) may be related to the "oscillations" that can be observed, especially in medium and low flows (e.g.January 2012, May 2013, Fig. E1b).
The reservoir model is very good, with a KGE of 0.85 (Table 4), although the model tends to slightly underestimate the discharges during high-flow events (β H = 0.82; Fig. 4b).This seems to happen when precipitation occurs during several days without reaching really high values, which may indicate either (i) some kind of hysteresis functioning with flow occurring after a connection has been made in the system or (ii) inflows into the system that are not taken into account in the model.The model is balanced and accurate in variability (γ = 0.92) and shape and timing (r = 0.96) but generally underestimates volume (β = 0.88).The model has good performance in high flows and is decent in low flows.After flood periods, the model seems to simulate a slower draining than observed -higher volume (β L = 1.19) and variability (γ L = 1.27) of low/medium flows -resulting in inaccurate recession periods for which the discharge is overestimated (e.g.November 2011, January 2013, April 2014).
Some periods like November 2012 or February 2015 are not simulated very well by both models (Fig. 4b), which may be related to uncertainties in the meteorological data input.DE is decent (0.39) for the ANN model and good (0.32) for the reservoir model (Fig. 7b).Both models have a positive dynamic error with a higher share of low flows, which highlight a small underestimation of the occurrence of low flows.
The modelling of discharges from Gato Cave spring shows that both approaches can have great performance given few modelling constraints.Raw precipitation input was used in both models, thereby avoiding additional uncertainties from interpolation or snow pre-processings.

Lez spring
The ANN model is good, with a KGE of 0.7 for discharge (Table 4) and 0.89 for water level (Fig. 5).The high piezometric levels (above 55 m a.s.l.) seem a bit too sensitive to precipitation events, especially at the end of 2019 (Fig. 4c).In discharges, the model is accurate in variability (γ = 1.13) and shape and timing (r = 0.93) but underestimates volume (β = 0.74).The overall underestimation of volume mainly comes from high flows (β H = 0.75) as they are the most represented in the time series.The model is decent in high flows but has too much variability (γ H = 1.38).In low flows, the model performs poorly, mainly due to a high underestimation of volume (β L = 0.64) and insufficient shape and timing (r L = 0.64).
The reservoir model is good, with a KGE of 0.7 for discharge (Table 4) and 0.75 for water level (Fig. 5).However, the model fails to reproduce the observed discharge for several months for the period between September 2020 and March 2020 (Fig. 4c).During dry periods, there is a too high deficit in the lower reservoirs, leading to a strong delay in the spring response when fresh precipitation occurs -the C reservoir having to be replenished beforehand.The model is balanced and accurate in shape and timing (r = 0.88) but overestimates variability (γ = 1.15) and underestimates vol-ume (β = 0.77).The model is decent in high flows but has poor variability (γ H = 1.46) and shape and timing (r H = 0.68) and also underestimates volume (β H = 0.78).In low flows, the model has too much variability (γ L = 1.37) and middling shape and timing (r L = 0.76).The piezometric levels are satisfactory when the spring is flowing (greater than 65 m a.s.l.) but lose accuracy during dry periods.The model could not reproduce the changes in flow dynamics at 46 m a.s.l.(August 2019, August 2020, Fig. 5).Also, the draining of the aquifer seems to be simulated more slowly than observed (July 2018, July 2019), which can be a result of the model trying to fit the aforementioned periods during calibration.
In both models, the poor overall KGE value in low/medium flows is probably due to the small occurrences of low discharges (except 0), thus inducing a high error in volume.DE is good for both the ANN (0.31) and reservoir models (0.35) (Fig. 7c).Both models have negative dynamic and constant errors with a higher share of high flows, which highlight an underestimation of the occurrence of high flows.
The time series is generally characterized by distinct dry periods without any recharge due to the anthropogenic pumping of water into the saturated zone of the aquifer.These periods are simulated fairly accurately by both models, but the ANN model is better at simulating first floods after or during dry periods.Several boreholes at the north of the spring showed flow-bearing structures at 50 m a.s.l.(Dausse et al., 2019).These fast water transfers could explain the rapid increases in observed piezometric level and the reactive spring responses.We also suspect an evolution of the carbonate's facies with depth, which could affect the effective porosity of the medium and induce different flow dynamics.These aspects are not considered in the reservoir model, which results in poor simulations when the water level is below 60 m a.s.l.However, this failure provides information on the structure of the aquifer, which is valuable for research.On the other hand, the ANN model was successful in learning these particular behaviours, especially as it only had a medium learning time of about 8 years.

Qachqouch spring
The ANN model is decent, with a KGE of 0.67 (Table 4), but strongly overestimates low flows at the beginning of December and then underestimates the flood peak at the end of the month (Fig. 4d).The model slightly underestimates volume (β = 0.87) and is lacking in variability (γ = 0.75) and shape and timing (r = 0.82).The high flows are poorly simulated, but the low flows are well fitted, although volume is slightly overestimated (β L = 1.21).The oscillations of the simulated discharges (Fig. E1c) may appear because the model does not account for the time needed for the aquifer to replenish and generate an increase in discharge at the spring.
The reservoir model is very good, with a KGE of 0.83 (Table 4).At the end of the dry period, the low flows are overestimated and the first flood is underestimated (Fig. 4d). https://doi.org/10.5194/hess-27-1961-2023 Hydrol.Earth Syst.Sci., 27,2023 This may be due to heterogeneous precipitation occurring in highly transmissive parts of the catchment.In this case, the soil available water capacity (E min ) -which is necessary for a good simulation of low-flow periods -may not be representative of the whole catchment, thus inducing a more dampened response than observed.The model is balanced and accurate in volume (β = 1.05) and shape and timing (r = 0.93) but slightly lacks in variability (γ = 0.85).The model is decent in high flows but has middling variability (γ H = 0.57), which can be due to the underestimation of the late December flood peak.The low flows and recession periods are overestimated (β L = 1.20 and γ L = 1.19).DE is bad for the ANN and reservoir models (0.77 and 0.84, respectively, Fig. 7d).Both models have a positive dynamic error with a higher share of low flows, which highlight an underestimation of the occurrence of low flows.Here, the positive dynamic error is influenced by the constant underestimation of the observed discharge during the dry period (October-December 2019), accounting for more than 50 % of the observations.
The very short data length is particularly detrimental to the ANN model as the learning period is only about 4 years.Furthermore, even when data are available, there is a significant amount of time without (relevant) discharge, for which no input-output relation can be learned.Due to the characteristics of the discharge time series, it can be assumed that a much longer time series of daily values would be needed to successfully simulate the discharges of Qachqouch spring.On the other hand, the reservoir model seems more appropriate for simulating Qachqouch spring discharges even with the limited data available.This highlights the strength of conceptual modelling in taking into account recharge processes and reservoir replenishment, even on a short dataset with long dry periods.

Unica springs
The ANN model is good, with a KGE of 0.73 (Table 4).The model is accurate in volume (β = 1.03) and shape and timing (r = 0.93) but insufficient in variability (γ = 0.74).The model is good at simulating high flows but is slightly lacking in volume (β H = 0.87), variability (γ H = 0.89) and shape and timing (r H = 0.79).The model is poor at simulating low flows, which are often significantly overestimated (β L = 1.92), especially the recession periods, which systematically have a slower draining (Fig. 4e).The overestimation of low flows could be the result of the model trying to better fit the high-flow periods during training, which may shift the whole discharge curve slightly towards the upper limits.The model also seems to be too sensitive regarding precipitation events, hence the wave-like behaviour of the simulated time series (Fig. E1d).DE is bad (0.72, Fig. 7e).The model has a negative dynamic error and a positive constant error with a higher share of low flows, which highlights an overestimation of the occurrence of low flows.
The reservoir model is good, with a KGE of 0.77 (Table 4).The model is balanced and accurate in variability (γ = 0.89) and shape and timing (r = 0.93) but has shortcomings in volume (β = 0.77).In some winter months (December 2017, March 2018), the model has a delayed response of the rising limb (Fig. 4e), which may be due to a slightly wrong parametrization of the snow routine.The model is good in high flows, but shape and timing (r H = 0.81) and volume (β H = 0.74) could be improved.The model accurately simulates low-flow volume but has too much variability (γ L = 1.25) and is middling in shape and timing (r L = 0.78).The difficulty of the model in reproducing the depletion of the capacitive function may be due to the size and complexity of the catchment and the very specific influence of poljes draining over the catchment, which cannot be simulated within the KarstMod platform.DE is very good (0.18, Fig. 7e).The model has negative dynamic and constant errors with a higher share of high flows, which highlight a small underestimation of the occurrence of low flows.
Both models were unable to reproduce the plateau-like behaviour observed at very high discharges (Fig. 4e), which is due to the flooding of a polje at Unica springs that influences the monitoring station (Mayaud et al., 2022).They are simulated as separate peaks, which is false in terms of model accuracy but may also have some underlying conceptual truth.Only two meteorological stations were considered, which is very few for such a large catchment (820 km 2 ).Moreover, the major recharge area (Javorniki plateau) does not have any direct climate data available.Both models have difficulties in consistently reproducing the very particular hydrological functioning of the system (influenced by polje and surface water).The ANN model is more reactive, which helps in reproducing the dynamics of high flood peaks but hinders the simulation of low flows.The reservoir model has better dynamics for medium and low flows but does not always manage to reproduce high flood peaks, which may be a consequence of the simple structure of the model.

Sources of uncertainties
Both the ANN and reservoir models have similar trends in water volume and hydrological variability (Fig. 6).Overall volumes are great, with β ranging from 0.74 to 1.05.Highflow volumes are systematically underestimated, with β H ranging from 0.70 to 0.98.Low-flow volumes are mainly overestimated -β L ranging from 0.99 to 1.92 -except at Lez spring, with β L of 0.64 and 0.72.Overall hydrological variability is mainly underestimated, with only Lez and Aubach springs having γ values slightly above 1.High-flow hydrological variability does not show a distinct trend, being either overestimated or underestimated depending on the studied system.Low-flow hydrological variability is mainly overestimated, with γ L ranging from 0.95 to 1.37.These overestimations may be due to (i) improper -and generally softersimulation of recession periods or (ii) too high sensitivity to precipitation events, especially in ANN models, inducing discharge oscillations during recession and low-flow periods.The performances in shape and timing (r) are mixed between the two approaches.They depend mainly on the system studied and the quality of the model but also on the hydrological period considered.
These similar results between the two approaches highlight a common struggle to simulate extreme water conditions.As the ANN and reservoir modelling approaches are very different, explanation must be sought in common factors to both approaches, such as input data, observed data, internal/external system dynamics or the consideration of extreme events during calibration.
-Input data.Generally, in one-dimensional modelling approaches, input data only come from at most a few meteorological stations and do not accurately reflect the heterogeneity of meteorological processes in a catchment.Spatial variability of precipitation can be very high and not fully captured by meteorological stations, (i) resulting in different travel times and generating a different response at the spring (Ollivier et al., 2020) and (ii) hindering the simulation of very high flows (Pereira et al., 2014;Hohmann et al., 2021) -especially in areas where strong convective storms are frequent (Lobligeois et al., 2014).McMillan et al. (2018) suggested that uncertainties in precipitation data are about 0-10 % at point scale but can go up to 40 % when considering interpolation uncertainties.Temperature data are generally less heterogeneous than precipitation, although they can be affected by complex topography (Aalto et al., 2017).In the case of snow-covered areas, this can result in strong uncertainties in the timing of snow accumulation and melting (Zhang et al., 2016) and therefore the recharge of the aquifer.The uncertainties related to precipitation and temperature input in one-dimensional hydrological models can thus -partly -explain the difficulties in reproducing extreme events (Lobligeois et al., 2014;Huang et al., 2019;Ollivier et al., 2020;Bittner et al., 2021), especially high flows.
-Observed data.Discharge time series are generally derived from water height measured at the spring, using water level-discharge calibration curves.Numerous uncertainties are related to this determination method (Pelletier, 1988), including extrapolation errors for extreme values (Di Baldassarre and Montanari, 2009;Moges et al., 2021).Extreme events occur more rarely and are harder to measure, especially high flows.This can result in inaccurately observed discharge time series that are difficult to reproduce with simulations (e.g.Unica springs at very high flows).The uncertainties related to discharge measurements are highly dependent on the quality of the gauging station and usually range between 10 and 40 % (McMillan et al., 2018).Although they are expected to be higher in a karst context (West-erberg et al., 2016), some authors reported uncertainties of about 20 % (Jeannin et al., 2021) or 10-15 % (Katz et al., 2009).
-Internal/external system dynamics.Karst systems are inherently complex media.Internal dynamics are not necessarily captured in hydrological models (Sidle, 2006(Sidle, , 2021;;Hartmann et al., 2017) and can be related to numerous processes in karst media, e.g. the saturation state of the system, surface water exchanges, temporary storage of water, incoming or outgoing flows from/to another aquifer, change in physical properties beyond a certain level, or karst features such as poljes or sinkholes.These complex processes do not occur systematically and can change from year to year (Ollivier et al., 2020).This can lead to difficulties in training ANN models or in adapting the structure of reservoir models.
-Extreme events during calibration.The ANN and reservoir models are both trained on a calibration period.By definition, extreme events are rare.Therefore, models may have fewer opportunities to successfully fit model parameters to such events (Seibert, 2003), preferring more balanced parameters that are appropriate to the rest -and most -of the time series (Onyutha, 2019).In addition, models are generally calibrated over the whole time series using one performance criterion against observed data.In this case, extreme events are not explicitly emphasized in the objective function.A solution could be to give more weight to the reproduction of certain parts of the time series, such as flood and dry periods (Singh and Bárdossy, 2012), or to use different model optimization techniques, such as cross-validation (Wilks, 2011).
Both approaches can also benefit from a careful assessment of the calibration period.For example, the ANN model is thought to overestimate low flows in Unica springs by trying to fit the plateaus at very high discharges.In Lez spring, the reservoir model simulates a slower draining in the aquifer (piezometric level) because it does not account for a potential change in underground dynamics.These limitations emphasize the need for a meticulous investigation of the results in regard to the characteristics of the system and the input data.Such errors can be avoided or lessened by excluding abnormal periods during the calibration, which can be justified by inaccurate input data or limitation in the conceptual model.

Comparison of general model properties
The main findings of this study are presented in Table 5.The extensive analysis of high and low flows did not show a clear trend but did reveal slight differences between the two approaches for this study.For high-flow periods, results slightly favour the ANN approach (except for Qachqouch spring), with consistently accurate volumes and shape and timing https://doi.org/10.5194/hess-27-1961-2023 Hydrol.Earth Syst.Sci., 27,2023  (Fig. 6).ANN models also tend to achieve higher flows than reservoir models (Fig. 4); due to the noticeable/strong karstification of the studied systems, the high occurrence of high discharge data may benefit the learning of the ANN models.
On the other hand, reservoir models are more dependent on the relevance and the quality of the input data pre-processing and thus can be more affected by the uncertainties presented in Sect.4.2, especially regarding high flows.For low-flow periods, results slightly favour the reservoir approach (except for the Aubach spring), with good estimation of volumes and only a slight overestimation of the hydrological variability (Fig. 6).The conceptual representation of the aquifer with reservoirs and transfer functions may help to simulate the recharge process (especially for inertial systems): a precipitation event will not directly result in a discharge increase at the spring if the reservoir is not fully saturated.On the other hand, ANN models seem to not always account for the time needed for the aquifer to replenish, inducing wave-like behaviours during medium-and low-flow periods (Fig. E1), which can hinder the simulation of low flows.The water level (Lez spring) was correctly simulated by both approaches, with only some imprecision during dry periods (Fig. 5).ANN models are flexible and provide numerous advantages over reservoir models with respect to input data.They can easily integrate meteorological processes (e.g.snow dynamics, evapotranspiration) without any pre-processing of the raw data, whereas this is generally calculated beforehand in reservoir models.It is also possible to add a large amount of raw data in ANN models and let the model select those relevant for a good simulation, which makes the modelling easier and can also give insight into the input data or catchment characteristics (Wunsch et al., 2022).This helps to avoid additional uncertainties related to (i) arbitrary decisions on the raw data (e.g.choosing precipitation from one station rather than another), (ii) interpolation (when data from several meteorological stations over a catchment are available) or (iii) pre-processing (e.g.snow routine, potential evapotranspiration).This great flexibility regarding in-put data makes ANN models close to a two-dimensional or semi-distributed approach.If necessary, the transition between one-dimensional and two-dimensional input data are comparably easy, whereas in reservoir models this usually involves changing or adapting the tool.
Reservoir models do not need a long calibration period to provide accurate and relevant simulation results.In contrast, a very short time series or a short time series with long dry periods can be detrimental for the learning of the ANN model, which seems to benefit from long periods of relevant discharge (at least 5 years).We have seen that the ANN model has difficulties in simulating the flows of Qachqouch spring, mainly because of (i) the short calibration period and (ii) the long low-water periods which are not relevant for training the model.On the other hand, the reservoir model has been able to integrate key elements (e.g.double porosity, matrix-conduit exchanges, fast conduit transfer in wet periods) by relying on the conceptual model.
The ANN approach does not require any prior knowledge of the system and inherently considers model structure and parameters.This makes the modelling process easier and faster, thus saving the operator a great amount of time.On the other hand, reservoir models require a significant investment in reading the literature, analysing expert knowledge, and doing trial and error during model design.Moreover, the cost of a change in structure is not trivial.Depending on the modelling platform (e.g.software, raw code), it may take more or less time -or even be impossible -to take certain elements into account.For example, in this study, the KarstMod platform does not allow one to (i) consider different porosities in the same reservoir, leading to difficulties in modelling the piezometric levels during dry periods for the Lez system, (ii) use different E min values, which may benefit the Qachqouch model, (iii) consider polje and surface water influence in the Unica model, or (iv) consider snow dynamics in the structure of the Aubach model.
Both the ANN and reservoir models can be used for research purposes.Model structure, transfer functions and pa-rameters are explicitly expressed in reservoir models, which can provide valuable insights into the hydrogeological structure of the reservoir and the internal processes of the karst system, e.g.(i) the relative contributions of fast and slow flows, (ii) the draining of each compartment, (iii) the activation thresholds of the overflow transfer functions (either to the spring or out of the system), (iv) the changes in flow dynamics with respect to dry and wet periods and (v) the exchanges between the matrix and conduit compartments.In comparison, ANN models act rather as a "blackbox" whose parameters are more difficult to exploit and associate with the hydrological functioning of a system.However, the ANN model can help to explore input data, thus indirectly providing insights into catchment delineation or external recharge processes (Wunsch et al., 2022).

Conclusion
Our objective was to provide researchers and stakeholders with guidelines for choosing either artificial neural networks or reservoir models to simulate karst spring discharges, depending on their purpose, data availability, data length and time budget.Five test sites were considered, allowing different hydrological conditions and input data to be studied.The results of the ANN and reservoir models were compared on the basis of several performance criteria, distinguishing between high-and low-flow conditions.Both models succeeded in simulating spring discharges satisfactorily but struggled to reproduce extreme events (drought, flood), generally overestimating low flows and underestimating high flows.This can be related to common problems in hydrological modelling regarding uncertainties in the input data or observed data and internal/external system dynamics or the consideration of extreme events during calibration.
ANN models seem robust for reproducing high-flow conditions and reservoir models for reproducing low-flow conditions.The input data are also a critical factor in choice.Reservoir models can work with relatively short time series, while ANN models need a minimum number of relevant years to learn the functioning of a karst system.On the other hand, ANN models are very flexible in the format and amount of input data.They can learn many meteorological processes with no prior need for pre-processing the raw data and use several time series for a single variable.This avoids arbitrary interpolation decisions (e.g.precipitation between several stations), parameter definitions (e.g.snow routine) or meteorological calculation (e.g.potential evapotranspiration) and allows these aspects to be integrated into the model calibration.
Both the ANN and reservoir models can be used for karst aquifer management, flood forecasting and system characterization.ANN models may be more appropriate for simulating high flows, delineating catchments, or assessing external recharge processes.Reservoir models seem more robust for simulating low flows and gaining insights into the internal functioning of a system.ANN models can also be interesting time-wise as (i) they do not require any prior knowledge of the system and (ii) model design is more flexible regarding input data and system functioning.
One of the difficulties this paper faced was distinguishing the general limitations of the reservoir modelling approach from those specific to the chosen modelling platform.In comparison to user-defined models, the modelling platform constrains the structure and the transfer functions of the conceptual model.Remaining within the KarstMod platform provided the time advantages of a turnkey toolbox (which are widely used in research and by stakeholders) but limited the possibilities of the conceptual models.Code and data availability.We provide complete codes for the ANN models and .propertiesfiles for reservoir models on Github (https://doi.org/10.5281/zenodo.7242077,Cinkus et al., 2022).Due to redistribution restrictions from several parties, a dataset cannot be provided.However, the data are available from the local authorities upon request.Aubach spring discharge time series and meteorological data from the Diedamskopf and Walmendinger Horn stations are available from the office of the federal state of Vorarlberg -division of water management.Meteorological data from the Oberstdorf station are available on the DWD Open Data Server (https://opendata.dwd.de/,DWD, 2022).The Gato Cave spring discharge time series is available from the Confederación Hidrográfica de las Cuencas Mediterráneas Andaluzas (https://hispagua.cedex.es/instituciones/confederaciones/andalucia,Cuenca Mediterránea Andaluza, 2022) and meteorological data are available in the "Datos a la carta" section in Consejería de Agricultura, Pesa, Agua y Desarrollo rural (http://www.redhidrosurmedioambiente.es/saih/ presentacion, Consejería de Agricultura, Pesa, Agua y Desarrollo rural, 2022).The Lez spring discharge time series is available on the OSU OREME website (https://doi.org/10.15148/CFD01A5B-B7FD-41AA-8884-84DBDDAC767E,SNO KARST, 2019), water level time series can be requested from Montpellier Méditerranée Métropole, and meteorological data are available on request from Météo-France.The Qachqouch discharge time series and

Figure 2 .
Figure 2. Selected structure of the CNN model.

Figure 4 .
Figure 4. Observed and simulated spring discharge time series with (i) 90 % confidence intervals (CIs) in the validation period and (ii) the threshold for high and low flows used for the calculation of the performance criteria.(a) Aubach, (b) Gato Cave, (c) Lez, (d) Qachqouch and (e) Unica springs.

Figure 5 .
Figure 5. Observed and simulated spring water level time series with 90 % CIs in the validation period (Lez spring).

Figure E1 .
Figure E1.Examples of wave-like behaviour produced by the ANN model in the (a) Aubach, (b) Gato Cave, (c) Qachqouch and (d) Unica springs.

Table 1 .
Summary of the studied springs and areas.Q mean corresponds to the mean observed discharge and P an to the annual mean precipitation over the considered period.

Table 2 .
Summary of input data.(i) P raw , (ii) P in and (iii) P sr refer to (i) raw precipitation data, (ii) precipitation data interpolated with Thiessen's polygon method and (iii) precipitation data redistributed by applying the snow routine.Q obs , Z obs and T refer to observed discharge, observed water level and temperature, respectively.ET (evapotranspiration) refers to either PET (potential evapotranspiration) or AET (actual evapotranspiration) time series.Q obs , P raw , T max , T min , T med Q obs , P Q obs , Q pump , Z obs , P in , T sin Q obs , Q pump , Z obs , P P raw , T and T sin data are from the Diedamskopf, Oberstdorf and Walmendinger Horn meteorological stations.b T max data are from the 1700 m meteorological station.c P sr data are calculated with the data from the Diedamskopf station. a

Table 3 .
Calibration and validation periods.

Table 5 .
Advantages and drawbacks of ANNs and reservoir models, based on the results of this research.