Evaluating Multi-Scale Flow Predictions for the Connecticut River Basin

This case study evaluates a computationally efficient distributed hydrological model, named Coupled Routing and Excess Storage (CREST), for flood modeling of basins in the Connecticut River Basin (CRB). Simulation of discharges is performed by forcing CREST with a long record (eight years) of high resolution radar-rainfall data and potential evapotranspiration maps derived from the North American Regional Reanalysis. The model performance is evaluated against observed streamflows obtained from United States Geological Survey gauging stations at outlet and interior points of various CRB sub-basins. CREST parameters were calibrated based on a three year record (2005-2007) and validated for the remaining data period (2003-2004 and 2008-2009). The model performance evaluation is based on different metrics, including the Nash-Sutchliffe Coefficient of Efficiency (NSCE), Mean Relative Error (MRE), Root Mean Square Error (RMSE), and Pearson Correlation Coefficient (PCC). The analysis shows that CREST slightly underestimated the peak flows, but exhibited a generally good capability in simulating the stream flow variability for the CRB basins. Specifically, NSCE, MRE, RMSE, and PCC values of hourly flow simulations varied from 0.31 to 0.58, -0.06 to 0.13, 61 to 121 (mm) and 0.60 to 0.83, respectively. At daily time scale the performance metrics exhibited improved values indicating that CREST has sufficient accuracy for long term multi-scale hydrologic simulations.


Introduction
Rainfall-runoff modeling has a long history; the first hydrologist that used rainfall-runoff model was an Irish engineer Thomas James Mulvaney (1822-1892) who published his work in 1851. During the last few decades, a number of conceptual and physically-based models have been developed and used for simulation of floods [1][2][3][4][5][6]. The analysis of hydrological model simulations and their spatiotemporal fluctuations can be used as vital tools to support management activities such as flood risk management, water supply and improving water quality. According to Brakendridge et al. [7], their study shows that an increasing population density has caused a greater risk regarding the natural processes, such as flooding, which makes it crucial to identify such risky areas. Dehotin and Braud [8] mentioned the importance of distributed hydrologic models, indicating that they are valuable mechanisms to spatially and realistically study the prediction of water balance components. Hydrologic modeling can be employed to evaluate flood mitigation alternatives to flood and drought risks. They can be used to evaluate and study the impact of land use on water resources. Additionally, Moriasi et al. [9] states that hydrological modeling is an essential tool for managing water quantity and quality. The spatial structure of distributed hydrological models can inspect and evaluate the heterogeneities of watershed characteristics and its parameters [8,10].
Even though distributed models capture sufficient details and realistic catchment characteristics, over parameterization can be a problem in calibration. Dehotin and Braud [8] stated the growing of concerns about optimum parameterization. They indicated that the major concern is the contrasting aspects of model complexity versus data availability. Rozalis et al. [11] stated that simplicity versus complexity of hydrological models is still a controversial subject for better representation of a catchment. In their study, a simpler model was used with minimum number of parameters. The smaller number of parameters does not require calibration to decrease uncertainty over ungauged areas. Moreover, Bergstrom [12] claimed an excessive increase in model complexity does not always improve the quality of the results. Brandt [13] supported this statement and further elaborated in his study that going from complex to simpler model does not affect model performance. Thus, using a model with optimal complexity relative to data availability and resolution is the key to improving hydrologic predictability. Also, it is worth mentioning that one of the important parameterizations in the modeling are the initial conditions and associated parameters sets. The initial condition of a model mostly depends on catchment area, catchment topography, antecedent moisture condition, ground water table position and land use. Hence, for distributed hydrological modeling, parameterizations of these initial conditions and its spatial variability are the fundamental factors for runoff simulations, especially for extreme rainfall events [14]. The influence of initial conditions on the model output reduces with increasing the simulation time period [15]. Additionally, a spinup or run-up period can be used to diminish the sensitivity to initial conditions. In this study a one-year spin up period was used in the model simulation. Therefore, the number of initialization parameters was efficiently reduced.
Two other significant issues for hydrological modeling are related to calibration and validation methodology, specifically the metrics used to analyze and evaluate simulation results [9,10,12,16,17]. In this study, we evaluated the accuracy of a grid-based distributed hydrologic model, which was calibrated and validated in the Connecticut River Basin. The performance of the model related to calibration and verification was evaluated using the Nash-Sutchliffe Coefficients of Efficiency (NSCE), Mean Relative Error (MRE), Root Mean Square Error (RMSE), and the Pearson Correlation Coefficient (PCC), which compared supplemental normalized flow simulations with obtained hourly observations obtained for the basin. For this analysis, a continuous simulation was applied for the hydrological analysis in the Connecticut River Basin using the Coupled Routing and Excess Storage (CREST) Hydrological Model. CREST model was selected for this study because the model currently represents both a national [18] and global flood simulation system [1]. Similar to other distributed models, catchments are represented as grid cells to simulate spatio-temporal variation of water, energy fluxes, and storage.
One of the purposes of this work is to examine the performance of the CREST model with calibrated parameters over interior sub-basin stations. Calibrating a model at a larger scale basin and using these parameters to model un gauged locations is one of the popular ways in situations where no observations are available. According to Bingeman et al. [19], a calibrated hydrological model based on stream flow data should be applicable on other sub-catchments without recalibration of the parameters set. State variables of basin characteristics related to land cover and soil texture for a given set of watersheds can be applied to other watersheds that are hydrologically similar without extensive recalibration [20]. Kouwen et al. [21] study also confirmed the fact that a parameter set can effectively transferred between watersheds to simulate peak flow without further calibration in southern Ontario. Additionally, Xie et al. [22] transferred calibrated parameters from data-rich areas to data-sparse areas and the results showed promising in estimating daily runoff with transferred variables. Hence, CREST was calibrated against stream flow observations at sub-basin outlets and was used to simulate stream flows based on the same parameter set at its interior nested sub-basins without further calibration.
This case study aims to evaluate the multi-basin scale predictability of stream flow in the Connecticut River Basin using the CREST model following calibration procedures suggested in the literature. Potential impacts of these fluctuations are important for water supplies, during the peak seasons of water demand, and mitigating flooding risk associated with peak flows. This study presents an opportunity to enhance and refine the estimation of hydrological processes via the CREST model and understand its calibration, parameterization and validation procedures over a mid-latitude basin. The main objectives of this work are: (1) use available observations to assess the accuracy of simulations, (2) determine the spatial variability of calibration parameters, and (3) understand the applicability of calibrated parameter values applied to interior catchments in order to predict flows at un gauged basin locations. Based on the topography and USGS observation gauges, the Connecticut River Basin was divided into nine sub-basins for all CREST simulations. Knowledge of the fluctuations and the accuracy of the predictions can assist in performing simulations over un gauged locations. For this purpose, three of the sub-basins, which have enough interior locations for un gauged analysis, were further subdivided into sub-watersheds in order to perform an analysis of parent basin calibration parameter runs.
The paper is organized as follows. Section 2 describes the study area, data, and the CREST hydrological model. Section 3 provides information regarding the model calibration, parameterization and validation with analysis of evaluated statistics. Section 4 provides a discussion of the results of the hydrological simulation using the CREST model for the Connecticut River Basin. The conclusions and future work are discussed in the last section.

Study Region and Data
The Connecticut River Basin (CRB) is the study basin for this work and the CRB is a major river basin in New England. Runoff from the CRB discharges to the Connecticut River. The Connecticut River starts in Quebec Canada and runs through Connecticut (CT), Massachusetts (MA), New Hampshire (NH), and Vermont (VT) and empties into Long Island Sound in Old Lyme Connecticut. The total watershed area contributing to the Connecticut River is approximately 28,500 km 2 . There are approximately 390 towns and cities located within the watershed with a total population of approximately 2.3 million people. The land uses that are within the watershed consist of forest, agriculture, residential and water. Approximately, 79% is covered by forested, 11% by agriculture and the remaining area is covered by residential and water. The CT River flows for about 660 km and provides hydroelectric power, is navigable up to Windsor Locks, used for irrigation, and is used for recreation [23].
Instantaneous records of river discharges obtained from nine United States Geological Survey (USGS) gauging stations within the Connecticut River Basin were analyzed in this study. The location of the gauging stations and their contributing sub-basins are illustrated in Figure 1. The black dots represent the streamflow gauges, which are labeled with USGS station numbers. The locations of USGS monitoring stations were used to determine the number of sub-basins based on the location and topography and their watersheds area. Given a particular gauging station along with the use of DEM data, we are able to determine the area contributing runoff at each station. The data from the nine stream gauges were used to calibrate and verify continuous simulations from the CREST model across watersheds with drainage areas ranging from 200 to 25,000 km 2 .
Zanon et al. [24] has argued that the flood response could be reasonably well reproduced when using high resolution rainfall observations. Even though continuous rain gauge data are available for the area, there are several gaps in these data records. As Hirpa et al. [25] reported unequal time period data between observation gauges might affect the certainty of parameter estimations. Additionally, Du et al. [26] have stated that there are many stations in the USA that do not have extensive flow records or have limited flow data. The gauge data measurements are available in 5-min, 15-min, and 30-min intervals, with 15-min intervals being the most common. Thus, the finer time scale gauge measurements were converted into hourly stream flow averages in units of m 3 /s from January 2002 to December 2009. The input data sets consist of gridded radar-rainfall (mm/h) and Potential Evapotranspiration (PET) (mm/3h) data. The precipitation data was extracted from the WSR-88D Stage IV product obtained from the North American Regional Reanalysis. The Stage IV radar rainfall fields were used to force the model at the hourly time step and 4 km spatial grid resolution. PET data based on the North American Regional Reanalysis (NARR) available at 32 km spatial and 3-hourly temporal resolution [27] were used as forcing variable for the hydrological model.
In addition to the radar rainfall and the PET data, High Resolution Digital Elevation Model (DEM) data was used to delineate the various watersheds and to perform river routing of the stream flow simulations.

Methods of Analysis Model
A raster-based distributed hydrologic model, Coupled Routing and Excess Storage (CREST) was implemented over Connecticut River Basin. The CREST model is a hybrid modeling strategy, developed by the University of Oklahoma and NASA SERVIR-an acronym meaning to "to serve" in Spanish-Project Team [1]. The model is spatially distributed rainfall-runoff model dedicated to simulate flow discharges at regional and global scales aimed to represent hydrological processes associated with floods and droughts. Distributed CREST model can be applicable to almost any kind of hydrological problems and provides important advantage over existing models under different land cover and soil type scenarios with user defined spatiotemporal resolutions. The CREST model provides water managers with information about streamflow amounts that can enable better decision-making regarding water resources, floods, and agriculture.
CREST model uses a combination of DEM, PET, and precipitation input data with different user defined spatiotemporal resolutions. The processes modeled include rainfall-runoff generation and capacity for cell-to-cell routing, canopy interception, infiltration, evaporation, recharge baseflow, sub-grid cell variability of soil moisture storage capacity and routing processes at the sub-grid scale. The model controls the maximum storage of the infiltrating water and yield surface runoff generation with connected layers within the soil profile. Cell-to-cell flow routing of direct surface runoff is applied using a kinematic wave assumption. Besides, coupling between the flow simulation and routing component via feedback mechanisms provides realistic applications of the hydrologic variables (i.e. soil moisture). The CREST model has been discussed extensively in previous papers [1,28] and details of the model can be found therein.
CREST model contains several default parameters. These parameters have value ranges initially specified based on land cover and soil type data. The definitions of parameters with their value ranges for CRB are listed in Table 1. The model contains 14 parameters, and three of them are related to initial soil conditions, (the initial value of soil water "iwu", initial value of overland reservoir "iso", and initial value of interflow reservoir "isu") that can be adjusted using warmup period. The majority of the parameter ranges come from physical considerations. For example, the slope flow speed multiplier (coem) can be regarded as the inverse of manning's roughness, and "river" is similar to "coem" but for river channels [29]. Parameter "under" represents the horizontal velocity of subsurface flow, and hydraulic conductivity is used for this velocity [1,30]. "leako" (leaki) is overland (interflow) reservoir multipliers, whichvaries from 0.01 to 1 [1]. "pwm" is the capacity of the soil to hold water [31]. "pb" is a parameter related to infiltration and described as the exponent of the variable infiltration curve while 'pim' is the percentage of impervious area, which is derived from land cover data [1]. "pke" is a multiplier factor to convert PET to local actual evapotranspiration [29]. "pfc" is the soil saturated hydraulic conductivity which is derived based on soil type data [1]. The initial values of these parameters are adjusted through a calibration procedure utilizing using measured stream flows at gauging stations [6]. This is discussed at the next section.

Model Calibration and Validation
CREST model calibration and validation were carried out over Connecticut River Basin at various selected observation stream gauges ( Figure 2). Bergstrom [12] defined model calibration as a process that model parameters are arranged to make model results to meet the measurements. However, if the number of parameters used in the calibration is large, automatic calibration is a better option to reduce labor-intensive. Additionally, automatic approach for model calibration abbreviates the time with the advantage of speed and power of high performance computers as well as the approach eliminates the kinds of subjective human judgments [9,16]. The auto-calibration routine based on Differential Evolution Adaptive Metropolis (DREAM) method   [15].
The numbers of parameters which were subject to calibration were limited to 14 as defined by Wang et al. [1]. Calibrated values are listed in Table 2 along with the acceptable intervals (Table 1) used to constrain the parameters search. Each basin was calibrated and validated separately to capture the spatial variability of the parameters over the region. Even though all parameters are in the acceptable range, the parameter values vary across basins. For example, the slope flow speed multiplier "coem" varied between 10.39 and 74.33 with a mean of 34.93 and 0.61 coefficient of variation. The maximum soil water capacity "pwm" parameter has the maximum coefficient of variation, i.e 1.15.
The simulation starts with a warm-up period to reduce the initial condition effect. Specifically, the three parameters related to initial conditions (iwu, iso, and isu) are model states that were adjusted automatically by running CREST with a warm-up period. Marshall and Randhir [23] observed in their study using a 40-year study period over Connecticut River Basin that the maximum snow accumulation occurs in the month of January and decreases exponential through April. To avoid snowmelt effects in parameter calibration the period of January through the middle of March was excluded from the error analysis.
Finally, after parameter calibration, CREST was validated for a four year period. Refsgard [10] defined model validation as a process of illustrating that the calibrated parameter set without further adjustments is capable to reproduce flows in different times other than the calibration period. This process was done again without any interruption during snow accumulation and melting span, but we excluded the snow process period (January 1-March 15) from the analysis.

Evaluation Indexes
Different error metrics have been used in hydrologic model validation exercises of past studies [9,16,32]. Multi criteria error functions aid to explain various aspects of hydrographs. In this study, the model efficiency to predict stream flow at basin outlets was demonstrated qualitatively by plotting time-series of observed and simulated stream flows and determining error statistics based on hydrographs normalized by the corresponding catchment area. The error metrics are listed below: Nash-Sutcliffe Coefficients of Efficiency (NSCE): MRE gives an indication of how close predictions are relative to the observations. A value of MRE=0 shows that the simulated total amount of discharges is unbiased to observations. Root Mean Square Error (RMSE): RMSE measures the magnitude of the differences between simulated and observed discharges relative to mean observed discharge value. Therefore, a low RMSE indicates better fit and the value of zero signifies the perfect fit.

Pearson Correlation Coefficient (PCC):
, , 1 where is the average of entire simulated streamflow values. PCC represents how well the linear relationship between measurements and predictions. This value ranges from -1 to 1.If PCC=0, then there is no relation between the two variables. The closer either -1 or 1 indicates stronger correlation between them.
Boyle et al. [16] have shown that RMSE is sensitive to the peak flows and can strongly bias recession error characteristics. PCC, on the other hand, reflects the collinearity between simulations and observations and correlation-based measures have excessive sensitivity to peak flows [32]. Moreover, MRE measures the average tendency of simulated against observed data and explains this tendency with overestimations or underestimations [9]. Servat and Dezetter [33] pointed out that even though NSCE error metric shows some weakness with low flows, it is the best objective function to provide extensive information on hydrograph prediction accuracy. In Figure 3, observed and simulated normalized hourly discharges were plotted along with measured rainfall for the period 2004 and 2005 at multiple stations. For brevity, three of the spatially various CRB sub-basins associated with different sizes are illustrated. Results, in the Figure 3, from first and second columns belong to validation and calibration periods, respectively. Three different size basins (small, middle, and large scale catchments), namely B0128500 (673 km 2 ), B01122500 (1,046 km 2 ), and B01184000 (25,019 km 2 ) catchments, are visualized from top to bottom. As it can be seen from the hydrographs, overall, model performed well over CRB and the variability depicted in the flow simulation is close agreement with observed flows. Even though the timing of the peaks is estimated well, CREST simulations tend to slightly underestimate some of the peak flows. As it can be seen from the hydrographs in April 2005 simulations underestimated observations associated with small amounts of rainfall. However, it would be expected that light rainfall produce smaller runoff and this mismatch in terms of peak discharges can be explained by uncertainties in either rainfall or stream flow observational data. However, although slight underestimation is captured in qualitative stream flow comparisons, the majority of the simulated discharges were attained close to the measurements.

CREST
The model outputs from the hydrological simulations are summarized in Table 3A and 3B for calibration and validation periods, respectively. Basin areas, observed and simulated mean discharges, error metrics results (NSCE, MRE, RMSE, and PCC), and percentage of unavailable observations during the analysis are illustrated. During calibration period, as it can be seen from Table 3, NSCE values vary between 0.31 and 0.68 for the nine-basins while MRE results range from -6% to 13%. Additionally, 0.60 is the smallest value with hourly time resolution for PCC metric. The demonstrated NSCE, MRE, and PCC values show that the model simulations have good agreement with measurements in the calibration data period. In terms of RMSE error metric, however, at some stations, it is difficult to appreciate the quality of the results.
In the validation period, on the other hand, Nash is lower (ranging between 0.12 and 0.58) for hourly resolution, while PCC values ranged between 0.42 and 0.77 for the nine-basins. It is interesting enough, during validation, overall MRE exhibited underestimation in the range of 4% to 26%. However RMSE values dropped between 2 and 20% in the basins, which indicate improved random error.
Moriasi et al. [9] have shown that hydrologic model performance is better for longer time steps (i.e. annual versus monthly) and claimed that simulation statistics improve as a function of time resolution. Fernandez et al. [34] supported this statement and reported that NSCE values during calibration in their study in the range of 0.36 and 0.66 for daily and monthly time resolutions, respectively. Hence, daily error metrics were calculated and illustrated in Table 3C.
The study illustrates a better agreement between estimated and measured flows at daily scale and model results become quite realistic. For instance, in B01205500, NSCE values increase from 0.31 to 0.47 during calibration and from 0.16 to 0.30 in validation period for hourly and daily resolutions, respectively. While PCC values improve from 0.60 to 0.70 (calibration period) and 0.46 to 0.59 (validation period) as a function of time resolution in the same catchment, no change is observed in terms of the MRE values, which is expected given that resolution primarily affects the random component of error. Similar improvements are also captured for the other basins (Table 3C). Moreover, RMSE values drop significantly and become less than 100% during both validation and calibration in the basins with the exception of B01193500 (103.91%).
Analysis of the error metrics with their annual fluctuations were, additionally, examined in Three nested subbasins were considered as ungauged interior catchments namely B01122500 ( Figure 5), B0112700 (Figure 6), and B01205500 (Figure 7). After model was calibrated for the outlet of the parent basin, the parameters were fixed, and then CREST was used to simulate the interior catchments. Figures 5-7 show the comparison between model outputs and measurements via statistical error functions with annual variability. The catchments were ordered with increasing area. Differences in model responses increase with increasing size of the catchment. Overall, results show that there is good agreement between the predictions and the measurements.
In Figure 5, statistical metrics of NSCE, MRE, RMSE, and PCC presented for different years and different interior subbasins of parent basin B01122500. The results show that the highest NSCE values are obtained in the larger basin (B01122500) while the small basin (B0119500) exhibits the poorest agreement in terms of the error metric. Basin size dependency is also observed in RMSE metric values with the highest value reported for the smaller basins. MRE results, on the other hand, are within +/-0.5, and the overall performance of the model in terms of PCC is above 0.5.
To test calibration, validation periods, and resolution effect on the error metrics, Table 4 statistics were calculated and showed the results as a summary. In Table 4, basin statistics are visualized at hourly and daily resolution during calibration and validation periods for ungauged interior subbasins of parent B01122500 basin. As it can be seen from table, number of time steps in the observed data is decreased as a function of catchment area. In addition, it is important to note that discharge point of small basin is at the furthest location from the calibrated downstream outlet. It is expected that the simulation performance is decreased as a function of catchment area. Additionally, slight improvements are observed changing time resolution from hourly to daily.  Additionally, their daily statistics are reported in part (C). It is observed that, overall NSCE results, in B01124000, are close to each other during calibration and validation periods. MRE value reduces from 0.41 to 0.23 during calibration and validation, respectively. While RMSE drops 20% from calibration to validation period, slight change is obtained in PCC with the value being around 0.7. Slight changes are also obtained as function of the time resolution for all the metrics in calibration an validation periods.
On the other hand, in Figure 6, variability is noted with respect to year for the same parent basin (B01127000) and nested catchment. The same trend is seen in the error metrics results of two basins spanning a 7-yr period with higher accuracy for the large basin (B01127000). This suggests that, overall the model provides a reasonably good description of the fluctuation.
Finally, calibrated largest parent basin (B01205500) is examined with interior nested basins in Figure 7. The smallest basin (01199000) tends to overestimate in terms of MRE results in addition to negative bias for the other two basins. NSCE results are below 0.5 while PCC values are above 0.3 in the 7-yr period. RMSE values improve as function of basin size.
The model efficiency is summarized in Table 6 during validation (section A) and calibration (section B) periods (hourly) as well as daily resolution (section C) for B01199000, B01200500 (considered as internal ungauged basins) and B01205500 (parent basin). MRE results show that the values are in the range of -0.02 and 0.21 for calibration and -0.15 and 0.07 for validation. Correlation is around 0.6 during calibration while it drops to 0.46 for the parent basin and improves to 0.65 for the medium basin (B01200500) in validation period. Specifically for the daily simulations, all RMSE values are below 100%. Examination of these statistics reported in the table shows that observations of the interior basins are well represented with CREST simulations.       Table 5: Same as in Table 3, but for the ungauged interior subbasin of parent 01127000 basin. Figures 8 and 9 illustrate the comparison of the simulations in terms of quantiles at the basins outlets and interior ungauged locations, respectively. MRE and RMSE error metrics were calculated at 0.2, 05, 07, 0.9 and 0.95 quantiles and shown with each spatially distributed station with transparency dots. Reduction of the RMSE and increment of the MRE tendency are observed from low to high flows. The poorer performances of the model in terms of MRE and RMSE are obtained at the higher quantiles and considered to be related to inadequate representation of the peak events. Figures confirm the tendency of underestimations at peak discharges and this consistency supports the argument of using frequency analysis with large flow data to improve model results.

Conclusions
This case study has demonstrated the implementation of the CREST model, and its calibration and validation processes at high temporal resolution with spatial distributed gauged observations as well as ungauged/poorly gauged catchments over Connecticut River Basin. The primary objective of this study is calibration and validation processes of distributed hydrological model based on stream flow observations and provide explicitly statistical analysis of the simulations via different objective functions. Another objective of this study is producing flow data for insufficient historical records at gauging stations using calibrated CREST model and predicting flows over un gauged locations of the Connecticut River Basin. In this manner, the basin was partition into 9 sub basins to represent distributed parameters sets via CREST model based on the USGS monitoring stations and their locations, and the model was evaluated with several important statistics with supplemental hydrographs.  Table 6: Same as in Table 3, but for the ungauged interior subbasins of parent 01205500 basin.
In this study, several statistical indicators and supplemental graphical illustrations were applied for evaluation of the CREST model performance. Using multiple statistics helps to cover a different aspect of the hydrographs. Based on the error metrics and graphical comparisons, CREST is considered to provide accurate rainfall-runoff simulations during calibration and validation period. The results also indicate satisfactory in model performance over Connecticut River Basin.
The CREST hydrological model estimates the hourly flow at spatially various catchments well, but with relatively large errors at peak quantiles. Overall, when we take into account statistical metrics and normalized hydrographs of the model results, it can be concluded that the CREST is capable of reproducing continuous hourly stream flows in Connecticut River Basin, which allows model use for flood management applications in both gauged and un gauged basins. Results with CREST illustrated that model performance was satisfactory in representing amplitude and timing of the flow peaks, and that the models performs better in predicting the entire hydrograph. The model application and its calibrated parameters can be used for future work such as comparing to satellites-driven flow, flash flood predictions, flood frequency, and future climatic analysis on local and national scale as well.