Technical review of large-scale hydrological models for implementation in operational ﬂ ood forecasting schemes on continental level

Uncertainty in operational hydrological forecast systems forced with numerical weather predictions is often assessed by quantifying the uncertainty from the inputs only. However, part of the uncertainty in modelled discharge stems from the hydrological model. A multi-model system can account for some of this uncertainty, but there exists a plethora of hydrological models and it is not trivial to select those that ﬁ t speci ﬁ c needs and collectively capture a representative spread of model uncertainty. This paper provides a technical review of 24 large-scale models to provide guidance for model selection. Suitability for the European Flood Awareness System (EFAS), as example of an operational continental ﬂ ood fore- casting system, is discussed based on process descriptions, ﬂ exibility in resolution, input data requirements, availability of code and more. The model choice is in the end subjective, but this review intends to objectively assist in selecting the most appropriate model for the intended purpose.

population and per capita wealth, which has resulted in increased exposure of both assets and people in flood-prone areas (EEA, 2010). Following the devastating floods in Elbe and Danube in 2002, the European Commission launched the development of a pan-European Flood Awareness System (EFAS; Bartholmes et al., 2009;Thielen et al., 2009) to improve disaster risk management through early warning information on European scale and subsequently reduce damages in the member states.
EFAS provides information on floods in Europe through a fully operational forecasting system. Currently, the system makes use of multiple meteorological forecasts including ensemble prediction systems to produce probabilistic flood forecasts and to assess the uncertainty of the forecasts. However, uncertainty in hydrological modelling is not limited to the meteorological forcing, but stems from a number of different sources: input data (including forcing), parameters, model structure and evaluation data, e.g., discharge (Refsgaard and Storm, 1996;Thielen et al., 2010). Several hydrological modelling studies have shown that the uncertainty introduced by model parameters and structure can be significant (e.g. Butts et al., 2004;Haddeland et al., 2011;Lohmann et al., 2004). For example, Haddeland et al. (2011) showed that an ensemble of 11 global models forced with the same data exhibited significant differences in the partitioning between evaporation and runoff, with global runoff estimates ranging from 290 to 457 mm yr À1 . Vel azquez et al. (2011) found that using a multi-model framework for probabilistic flood forecasting outperformed both using a single hydrological model driven with ensemble meteorological data and using multiple hydrological models driven by deterministic meteorological forecasts in a study of 16 lumped catchment models in 29 French catchments.
In a recent study by Wetterhall et al. (2013), operational flood forecasters ranked using multiple hydrological models as one of the top priorities for improving EFAS. A multi-model system would create a more robust forecasting system by better representing the model structural uncertainties and therefore better assess the total uncertainty. In view of these uncertainties, as well as the high cost of implementation of new model systems, it is of great importance to select the best model(s), not only in terms of performance, but also in terms of feasibility of technical implementation. Trambauer et al. (2013) review 16 large-scale models with the specific focus on suitability for drought forecasting in Africa, and Sood and Smakhtin (2015) review the emergence of global hydrological models, with a focus on 12 models, and trends in and constraints for model development. However, to our knowledge, no study has focused on the applicability of this type of models for large-scale operational flood forecasting.
The aim of this paper is to provide a comprehensive review of large-scale models in the context of suitability of pan-European operational hydrological forecasting. However, the assessment is deliberately broad, which would fit any application on the continental or large sub-basin scale. Special emphasis will be put on the model availability and adaptivity to the specific purpose, but the hydrological process descriptions of each model is also an important factor. This paper does not contain a direct hydrological model performance comparison. Instead it focuses on an assessment of the suitability of implementation in the context of an operational flood forecasting system. A review of potential large-scale routing models is also included since routing is fundamental for flood forecasting, but not always included in large-scale hydrological models.

EFAS
The current modelling system within EFAS is fully operational and produces a number of forecast products based on meteorological forcings from three different providers: the European Centre for Medium-Range Weather Forecasts (ECMWF), the Deutscher Wetterdienst (DWD) and the Consortium for Small-scale Modelling (COSMO). The forcings used are: 10-day forecasts from ECMWF (deterministic and ensemble with 51 members), 7-day forecasts from DWD (deterministic) and 5-day forecasts from COSMO (ensemble with 16 members). There is only one hydrological model, LISFLOOD (van der Knijff et al., 2008), which is run on a 5 km grid for the entire European domain with a 6-hourly time step for all forcings apart for the ECMWF ensemble, which is run with a daily time step.
EFAS issues flood alerts to the member states' hydrological services based on return periods determined from running the system with observed data for a 20-year period and postprocessing the results with generalised extreme value fitting. This ensures that the modelling system is consistent since the same parameterisations are used in deriving the flood alert levels and in the discharge forecasts. EFAS is a complement to existing national flood-forecasting tools, since it forecasts flooding in transboundary catchments across Europe in one system. The main purpose of EFAS is to deliver early probabilistic warnings rather than very detailed forecasts that one would be able to get from a national forecasting system.

Criteria for a continental hydrological model
The first steps in any in any model selection process is to assess the aim, resolution and scope of the model system (Bennett et al., 2013;Jakeman et al., 2006). In the case of EFAS, the purpose of the system was clear from the start, to provide a European-wide forecasting system. The development of the system is a constant balance between the wishes from the users, the scientific progress of probabilistic forecasting and constraints due to computational costs and data availability (Wetterhall et al., 2013). This review is a qualitative rather quantitative model assessment, and as such evaluates the models from a number of selection criteria, for example identification of user community, demands on model structure, complexity, flexibility etc. (Bennett et al., 2013). The selection criteria are in the end subjective since they are a consequence of the application in question, and the list below reflects the demands for an operational continental system.
Many technical aspects need to be carefully considered in order to adapt a model to an operational continental-scale modelling system. These include for example process descriptions, availability in terms of licencing, open source code etc. and applicability to the given problem at hand. One important process of a floodforecasting model is the ability to respond to differences in the precipitation patterns in different parts of a catchment. This can be accounted for with a fine spatial distribution (typically on the order of 1e10 km), but can also be assessed through statistical representation of flood-producing mechanisms. Other important processes that are crucial for flood forecasting on European scale are, for instance, snow accumulation and snow melt which affects the timing and magnitude of spring flows in cold regions. In addition, runoff generated within a computational unit needs to be converted to discharge and routed along a river network to produce discharge forecasts along a river course. However, since not all large-scale hydrological models include a routing component, this study also provides a short review of existing large-scale river routing models (see Appendix B online supplementary material).
In addition to the review of the process descriptions, a list of criteria were set up to guide the model selection in terms of adaptivity to continental scale forecasting. The criteria were modelled on EFAS to be used over a pan-European domain and may not fully encompass aspects that may be important in other regions where, such as areas with limited data available for calibration. Additional criteria, such as transferability to ungauged catchments, need to be carefully considered in such cases, but is outside the scope of this paper. The following list summarises the aspects considered important for the implementation of a hydrological model on a continental scale: 1. Availability of model code. Code must be available for use (open source or through agreements) with possibilities of adaptation to specific purposes. Executable code is not sufficient, since changes to for instance reading of input data will be necessary. Open source also allows for in-house bug fixing. Forecast deliveries run the risk of being delayed if bug fixes or updates can not quickly be incorporated in the model. 2. Existing user community. Code must be actively used and developed with core developers identified to ensure that proper support can be given in initial phases and that the model is constantly developing. 3. Input data requirements. The model input data requirements must be possible to extract from existing databases, both in terms of spatial and temporal resolution and in terms of the variables needed. 4. Flexibility to grid structure. A majority of large-scale models uses a gridded structure where each cell represents a computational unit (sometimes sub-divided into tiles), but some models use sub-basins or other types of sub-domains as the computational units. In such cases, flexibility to gridded input data is important. 5. Possibility of calibration with suitable tools. Many large-scale models use parameter values that are set a priori from e.g. soil and vegetation maps. However, calibration of models with discharge data generally improves the performance, which was seen in The Project for Intercomparison of Land surface Parameterization Schemes (PILPS) for instance (Wood et al., 1998). 6. Flexibility in resolution. Models must provide the possibility of being run at a spatial resolution of 1e10 km. If down-/up-scaling is needed it is an advantage if the procedure is straightforward. If models are not run in a grid structure, flexibility in resolution of computational units should also be investigated. 7. Facility of introducing discharge observation stations (data assimilation). It is an advantage if there is some facility available to introduce observational data to update the model, for example discharge stations. 8. Existing large domain model set-up. It is beyond the scope of this paper to review the large number of catchment models that have been developed for specific hydrological settings and their potential for up-scaling and therefore only models that have a clear large-scale focus have been included. Table 1 Large-scale models included in the review. German Dill catchment: (Bormann, 2006), Field experiment sites: (Famiglietti and Wood, 1994;Pauwels and Wood, 1999;Peters-Lidard et al., 1997) VIC (Variable Infiltration Capacity) Technical: (Gao et al., 2009;Lohmann et al., 1998Lohmann et al., , 1996 This list reflects the specific needs of the operational EFAS and may therefore need to be adapted with different sets of criteria for other applications. However, most of the requirements can be considered universal for any continental or large-scale system.

Models included in the review
A total of 24 models were initially listed as potential candidates for operational flood forecasting (see Table 1 for main references and applications). These models were included because of their specific aims towards large-scale applications and includes LIS-FLOOD, the model currently used in EFAS, as the reference model which is to be complemented by the candidate models. Many widely used rainfall-runoff models were discarded because they had not been applied to or developed for large-scale applications. The models were assessed both based on literature reviews and communication with developers.
All of the listed models have been applied over large-scale basins, continental or global domains (Table 1). It is neither possible nor the aim of this paper to assess the individual performance of the models in the context of flood forecasting. This is because, although all models have been evaluated against observed discharge to some extent, most commonly this has been done for long-term averages (e.g. mean monthly or annual discharges) although exceptions exist. For instance, G2G is used for operational flood forecasting for the United Kingdom (Price et al., 2012), H08 has been set up for quasi-real-time simulation for the Chao Phraya River in Thailand (Hanasaki et al., 2014). On the global scale, VIC has been set up for real-time flood estimation within the Global Flood Monitoring System (GFMS; Wu et al., 2014) and H-TESSEL is used (for runoff simulation) in combination with LIS-FLOOD (for routing) in the Global Flood Awareness System (Glo-FAS; Alfieri et al., 2013).
The models differ greatly in complexity, from simpler conceptual models which only simulate water balance and discharge to more physically-based models who also include energy, nutrient and carbon fluxes and dynamic vegetation simulations. Appendix A in the online supplementary material summarizes similarities and differences in process descriptions and provides detailed information of all 24 models (Table A1).
The review of the process descriptions highlighted that some models lack a routing scheme (Table A1), which is a fundamental feature of any flood forecasting system. Those models would therefore require a coupling to a routing scheme before they can be operationally implemented. Several large-scale standalone routing models were therefore reviewed for the use within the forecasting scheme (detailed information on those is summarized in Appendix B in the online supplementary material). The complexity of the routing methods ranged from simple linear storage-discharge schemes to solutions of the full shallow water equations. Since these routing schemes are available for coupling, the lack of a routing scheme in the hydrological model does not constitute a reason to exclude a model from the list of possible candidates for EFAS.

Code availability and applicability
Most of the models in this review are freely available as open source or upon request and PCR-GLOBWB is currently being prepared to become open source. However, some of the models are only available as executable files (G2G and GWAVA) and some not at all (WaterGAP and WASMOD-M). Mac-PDM is available in principle, but has been used as a research tool and has not been documented with the purpose of providing guidance for new potential users (N. Arnell, pers. comm., 28 Nov 2014). The lack of a user manual may make implementation difficult although the code is available. The availability of code and technicalities related to the model applicability within a pan-European operational flood forecasting scheme are summarised in Table 2. LaD was excluded from the table since the code is no longer maintained or developed (P.C.D. Milly, pers. comm., 21 Mar 2014).
The horizontal discretisation is most commonly based on regular or irregular grids of different resolutions (Table 2). However, E-HYPE, SWAT and SWIM use a sub-basin structure instead and WBMplus is flexible in this respect. All models can however be applied in a regular grid resolution if needed. In addition, the grid cells/sub-basins often include representations of sub-grid/subbasin variability in e.g. vegetation. These computational units are termed differently in different models e.g. classes (E-HYPE), hydrotopes (SWIM) and tiles (JULES and H-TESSEL) and are not spatially organised like the grid cells/sub-basins (i.e. represent fractions of the spatially organised units). mHM uses a spatial discretisation with three different levels which are aimed to describe the spatial variability in i) forcing, ii) hydrological processes and geology, and iii) topography, soils and land cover (Samaniego et al., 2014a).
In terms of resolution, most models are flexible given that input data is available, but often recalibration is necessary. It is recommended that TOPLATS is not run on a coarser grid than 1 km (N. Chaney, pers. comm., 30 Apr. 2014). MATSIRO is perceived as difficult to downscale to the required resolution of EFAS (K. Takata, pers. comm., 25 Mar 2014). Input data requirement varies greatly between the models depending on their degree of complexity. In terms of forcing, many can work with a relatively limited number of variables. In addition, the models need static input, e.g. vegetation and soil types, to varying degrees. However, an exhaustive list of these inputs was not feasible since these requirements are largely dependent on the degree of detail a particular application needs. In the European context, the data requirements in terms of variables and resolution are unlikely to be an issue, but for setups over other spatial domains, the requirements may need more careful consideration.
Calibration tools are available for some models, but the degree of calibration also differs much between models. Some models are not calibrated against observed discharge at all (e.g. CLM, LPJml and MPI-HM) and some are manually calibrated to specific sites (e.g. H08 and NOAH-MP). Type of calibration tools provided differs between models and facilities to introduce new discharge observation stations are provided by all models that are typically calibrated and provide tools for calibration, but also by CLM for comparison purposes. mHM has been specifically developed to not need recalibration when applied at different resolutions (Samaniego et al., 2014a).
Many models are set up for the global domain, but do not have any specific European model set up. Models that are set up for the pan-European domain specifically are: E-HYPE, G2G, GWAVA, H-TESSEL, JULES, LISFLOOD, LPJml and mHM. About half of the models have possibilities to include observed discharge for different purposes (mainly calibration). In the current EFAS setup, observations are used for post-processing , but with future developments there is scope for incorporating remote sensing data such as soil moisture (Massari et al., 2015), snow cover (Che et al., 2014) and altimeter data (Munier et al., 2015) into an operational framework. Soil moisture and snow cover from H-SAF (http://hsaf.meteoam.it/) are currently displayed in EFAS but are not yet assimilated into the hydrological model.

Evaluation of suitability for operational flood forecasting within EFAS
The final decision on the best model for a certain application is by necessity subjective, although an objective evaluation will guide the choice of suitable models. For the sake of argument, we discuss the model suitability from the view point of EFAS. It must be stressed that the classification does not reflect a model's quality or performance in terms of its specific purposes, it merely serves as an indication on the suitability for inclusion in the specific modelling framework of EFAS. Furthermore, the suitability may change with time if more criteria are met for a certain model.
The least suitable models are models that lack important features or where the source code is not available. The models classified as least suitable can be either impossible to use (due to lack of access to code) or require so many modifications that it is not reasonable to proceed with them in the light of more promising candidate models. LISFLOOD, which is the model currently used within EFAS, is naturally classified as suitable as it requires no modifications, but it should be pointed out that the code is not yet open source which may render it less suitable for potential users outside JRC for the near future. Ten models were found to be the least suitable for inclusion in EFAS and for transparency the main reason for this classification is given for each model: CLM (calibration would be complex), G2G (only executables available), GWAVA (only executables available), LaD (code no longer maintained or developed), LPJml (discharge calibration not advisable), Mac-PDM (code available, but lacks documentation and instructions), MATSIRO (difficult to downscale), MPI-HM (calibration not advisable), WASMOD-M (code not available) and WaterGAP (code not available).
All the remaining models could be seen as being suitable for inclusion in EFAS to varying degrees, depending on the amount of work required to adapt them to the operational system. The main considerations are discussed in the following. JULES, H-TESSEL and ORCHIDEE are typically not calibrated and would need either to be run without calibration or require work on development of calibration methods. For H08 and NOAH-MP there are no calibration tools available and they would also require some developments in that respect. SWAT requires a high number of inputs (e.g. on tilling, fertilizers etc), which can be difficult to obtain with sufficient detail even over the European domain, and is not necessarily justifiable in terms of the work needed to gather and prepare the data for the purpose of flood forecasting. In addition, SWAT, E-HYPE and SWIM, are typically run for sub-basins rather than regular grids and although it is possible to run them on regular grids this might need special considerations e.g. for calibration. If they are run with subbasin structure, on the other hand, some post-processing tools will need to be developed to make model outputs comparable within the ensemble. The sub-basin approach has the advantage of using natural boundaries and with that a more correct representation of the actual watershed characteristics. TOPLATS and VIC do not offer full flexibility in the resolution within the 1e10 km span (TOPLATS upper limit approx. 1 km and VIC lower limit approx. 6 km), which also might require considerations in terms of making results within the ensemble comparable. Models that appear to require little work in term of adaptation to the current EFAS system are: mHM and WBMplus.
Since some of the possible candidate models lack routing schemes they require coupling to some standalone routing model (Appendix B in the online supplementary material). The short review of routing models showed that CaMa-flood, TRIP2 and LISFLOOD-FP could be suitable candidates since they have been successfully coupled with some of the candidate models and therefore most likely offers the simplest implementation in the framework.
It should be noted that this review assesses the models as they are currently implemented, but since many of them are under development it is likely that the process description in Table 2 and suitability will change with time through developments following research interests and user requests. The requirement to have open source code and an active developer's community is therefore very important if the model is going to be used operationally.

Conclusive remarks
A large number of large-scale hydrological models were reviewed with the aim of assessing their suitability in an operational flood forecasting framework for Europe. Models were assessed both in regard to their process descriptions and their applicability for the particular purpose. Important criteria for their applicability were for example the availability of source code with an active developer community, flexibility in resolution and calibration tools. The selected criteria for any application will naturally depend on the specific needs and will be different for each case. The criteria are naturally subjective, but also important for any operational setup that cannot rely on in-house development.
This technical review can only serve as a first step in the model selection for EFAS or any other operational flood forecasting scheme. The identified candidate models need to be quantitatively tested over a number of basins to assess the hydrological performance of each model before including any models into an operational framework. Model performance will be of importance when selecting a model to use, but also to select models that use different modelling approaches and can capture different hydrological signals, since the idea of a multi-model system is to capture many aspects of the hydrological regime (e.g. peak discharges, lakes and reservoirs, flash floods and spring floods). This will add more information to the system, but also demand that a good postprocessing can weigh the models according to their performance, especially in an operational framework where the forecaster will have to make an informed decision based on a plethora of information.