Performance of conceptual and black-box models in flood warning systems

Flood forecasting is a core of flood forecasting and flood warning system which can be implemented by both conceptual rainfall–runoff (CRR) model and black-box rainfall–runoff (BBRR) model. Dynamic artificial neural network (DANN) as an innovative BBRR model and HEC-HMS as a traditional CRR model were used for flood forecasting. The aim of this paper is to compare the efficiency of HEC-HMS and DANN for the determination of flood warning lead-time (FWLT) in a steep urbanized watershed. A framework is proposed to compare the performance of the models based on four criteria: type and quantity of required input data by each model, flood simulation performance, FWLT and expected lead-time (ELT). Finally, the results show that FWLT and ELT were estimated longer by DANN than by HEC-HMS model. In brief, because of less required data by BBRR model and its longer ELT, future research should be focused on better verification of it. Subjects: Civil, Environmental and Geotechnical Engineering; Water Engineering; Water Science


ABOUT THE AUTHOR
Mohammad Ebrahim Banihabib holds a PhD of water resource engineering from Kyushu University, Japan. Currently, he is an associate professor and head of Department of Irrigation and Drainage Engineering, University of Tehran, Iran. He had studied debris flow experimentally and numerically and proposed empirical equations for its roughness and sediment transport, and a mathematical model for deposition of debris flow in floodplain. In recent decades, he has focused on three major research topics: experimental and numerical research on flash flood and debris flood control, innovative models (such as ANN and hybrid models) for flood and streamflow forecasting and strategic planning of water resources. In this paper, he focused on producing sophisticated a model for flood forecasting and flood warning. Flood forecasting and flood warning system is a non-structural measure and an effective flood control technique for mitigating flood consequences. Furthermore, it can improve the efficiency of flood management plans.

PUBLIC INTEREST STATEMENT
Population growth increases in urbanized areas and urbanization produces more floods. Also, developing urban areas in floodplains intensifies flood risk for residents. Earlier warning of floods can be used for evacuation from flood-prone areas and thus we always need appropriate models to forecast and warn floods. Frequent flash floods are reported from around the world. The extreme achievements in flood forecasting are commonly attained on large rivers. However, flash urban floods linked with dense storms in highly populated areas in urbanized watersheds need sophisticated models. The aim of this research is to compare the efficiency of HEC-HMS (a traditional model) with proposed DANN model for the determination of flood warning lead-time. This research shows that because of less required data for flood forecasting by DANN model and its longer lead-time estimation, it can be proposed for future researches.

Introduction
Urbanized watersheds in the mountainous areas of the world are impacted by flash floods which cause casualties and property losses and thus always need appropriate models to forecast and warn them. The utmost achievement in flood forecasting is commonly achieved for large rivers. Nevertheless, flash urban floods linked with dense storms in highly populated areas are often very tentative and are more problematical to forecast due to multifaceted dynamic phenomena tangled (Chang, Chen, Lu, Huang, & Chang, 2014). Frequent events of flash floods were reported in Southern Africa (Du Plessis, 2004), Malaysia (Alaghmand, Abdullah, Abustan, & Eslamian, 2012), the USA (Johnson, 2000), Oman (Al-Rawas, 2009), Korea (Kim & Choi, 2012), Europe (Gaume et al., 2009;Lesage & Ayral, 2007) and Iran (Golian, Saghafian, & Maknoon, 2010), which require measures to diminish their impacts (Du Plessis, 2004). In addition, because of the climate change, floods are considered one of the most significant rising natural threats of the world (Choi, 2004;Hegedus, Czigany, Balatony, & Pirkhoffer, 2013). These studies show that flash floods threat lives and properties in urbanized areas in downstream of steep watersheds; thus, appropriate models are needed for flood forecasting and flood warning system (FFFWS).
FFFWS is a non-structural measure and an efficient flood control technique for reducing flood consequences. Furthermore, flood forecasting can be used to improve the effectiveness of flood management plans (Andjelkovic, 2001;Li, Chau, Cheng, & Li, 2006;Liu & Chan, 2003). Yet, the effect of urbanization on FWLT should be deliberate to appraise their efficiency in FFWSs. FWLT is the time period between the detection of flood exceedance over specific flow threshold to the starting of flood damage. During this period, authorities should be able to apply action plan in order to reduce flood consequences. In principle, the longer the FWLT is, the higher the chance it has to decline the flood negative consequences. Studies show that urbanization will slightly raise flood-negative consequences in the next 30 years (De Roo, 1999;De Roo, Schmuck, Perdigao, & Thielen, 2003). Because of the short time for emergency action related to short watershed time-of-concentration, these negative consequences are more substantial for the highly populated areas like Tehran, Iran, with steep watersheds in the north. Therefore, existing models for flood forecasting should be examined to determine their ability in forecasting flash floods in such highly populated areas.
Flood forecasting is the central part of a FFFWS which can be implemented by a hydrological rainfall-runoff model. The hydrological rainfall-runoff model can be classified into two categories based on incorporating physical phenomenon into rainfall-runoff modelling: conceptual rainfall-runoff (CRR) model and black-box rainfall-runoff (BBRR) model. There are numerous presented FFFWSs that usually include some interacting modules which typically include at least a rainfall-runoff model. The rainfall-runoff models are mostly for real-time flood forecasting by either black-box or conceptual model in small watersheds. Black-box models do not incorporate the hydrological processes within the catchment, even in a simplified manner. However, they can be trained and verified easily in a flexible context which has made them attractive in flood forecasting (Brath, Montanari, & Toth, 2002). Additionally, conceptual models permit an explanation of spatial and time-based conservation and response laws in the watershed. Various scholars simulated floods of different watersheds via diverse rainfall-runoff forecasting models (Kafle et al., 2007;Vieux & Moreda, 2003). Numerous rainfall-runoff models were developed based on conceptual models (Feldman, 1995;Kafle et al., 2007;Vieux & Moreda, 2003). However, the necessity for a huge number of field data to calibrate and verify the models has limited their application. Clearly, there is still a strong need to provide alternate models (Chiang, Chang, & Chang, 2004). Therefore, further investigation is needed to find appropriate models for forecasting flash floods among existing black-box and conceptual models.
HEC-HMS, as a CRR model, was widely applied for forecasting flood (William & Matthew, 2010). Yet, its effectiveness in determining FWLT should be examined in steep urbanized watersheds. CRRs have been extensively applied since 1961 (Kafle et al., 2007;Sugawara, 1961;Vieux & Moreda, 2003;William & Matthew, 2010;Zhijia, Xiangguang, & Chuwang, 1998). A CRR is developed based on spatial and time-based conservation and response laws for relating rainfall and the watershed's features. In developing a CRR model, the model factors are usually watershed specific and obtained by calibrating observed floods. Conventionally, a single set of factors is expected to be related with a watershed and be appropriate to various sorts of flood occasions (Minglei, Bende, Liang, & Guangtao, 2010). Some scholars investigated run-off of a watershed by a CRR model. For instance, Vieux and Moreda (2003) simulated flash floods and debris floods of a watershed in Taiwan by Veflo TM model. Kafle et al. (2007) also investigated the flood of Bagmati Watershed in Nepal using a CRR model, HEC-HMS. HEC-HMS is developed by the US Army Corps of Engineers (William & Matthew, 2010). Their results show that there is no precise arrangement between observed and simulated low flow, but the model properly predicted the peak flow. CRR models such as HEC-HMS were considered appropriate for flood forecasting (Kafle et al., 2007). Still, the effect of urbanization on FWLT is needed to be tackled to judge flood mitigation efficiency. Therefore, HEC-HMS, as a widely used program for flood forecasting, is a typical CRR model and its efficiency can be compared with the efficiency of BBRR model in determining FWLT.
HEC-HMS, as a widely used CRR, and DANN, as an innovative BBRR, were used for flood forecasting. However, their effectiveness should be evaluated by contracting their abilities in determining FWLT of the flash floods of steep urbanized watersheds. Consequently, the aim of this paper is to compare the efficiency of HEC-HMS and DANN for the determination of FWLT in north of Tehran, as a steep urbanized watershed.

Case study
Tajrish watershed is a steep urbanized watershed and located in the north of metropolitan city, Tehran, Iran. It was considered as the case study in this research (Figure 1). The watershed with a gross slope of 25.6% and area of 3,285 ha is a steep basin and the main flash flooder watershed in north of Tehran. Figure 2 shows the land cover of the watershed. According to Figures 1 and 2, a part of the watershed is urbanized in recent decades. The watershed is one of the major flooders of Tehran and is an appropriate case to test the performance of CRR and BBRR models in a steep urbanized watershed.

Calibration, verification and flood forecasting phases and required data
Characteristics of the watershed were utilized for calibration, verification and flood forecasting by the models. Table 1 illustrates data for calibration/training and verification. In addition, it shows required data for flood forecasting. HEC-HMS model needs the watershed's physical characteristics such as land cover data (as shown in Figures 1 and 2), area and slope of sub-basins, observed rainfall, and cumulative rainfall hyetograph (as shown in Figure 3) and observed flood hydrographs (as shown in Figure 6((a) and (c)). The DANN model does not require the watershed's physical characteristics such as land cover data (as shown in Figures 1 and 2), the area and slope of sub-basins.  Then, 10,000-year flood was simulated by HEC-HMS using cumulative rainfall hyetograph of Niyavaran Meteorological Station as shown in Figure 3 (Banihabib, 1997). Finally, the 10,000-year flood was used for training the DANN model and the trained DANN was verified as mentioned above and was used for the assessment of FWLTs. Accordingly, using these data, the models were calibrated, verified and prepared for flood forecasting. HEC-HMS is used as a CRR model to determine FWLT. HEC-HMS is a new generation of models developed for rainfall-runoff simulation by the US Army Corps of Engineers (William & Matthew, 2010). First, HEC-HMS model was calibrated using the observed data as shown in Table 1. Since estimation peak of flow is very important for flood control, after determining the initial CNs of sub-basins, they were calibrated by minimizing the error of peak flow (EPF) using Equation (1) as follows:

Framework for comparing the models
where, Q p0 is observed peak flow and Q ps is the simulated peak flow. Since Muskingum-Cunge has a hydraulic base and produces more accurate results, it was used for river flood routing in the  The DANN is proposed as a BBRR in this paper which uses a recurrent mechanism to handle the memory ( Figure 5). The recurrent mechanism means that the output of neurons in the output layer can also be applied as the input of the DANN. Characteristics of the structure of the proposed DANN model can be categorized into two classes: general characteristics (GCs) and tested characteristics (TCs). GCs are number of layers and activation functions which were decided based on previous researches' suggestions (Chang et al., 2014). A three-layered DANN (input, hidden and output) is flexible enough to capture any nonlinear function and was used in this paper. The activation functions of hidden and output layers of the proposed DANN were sigmoid and linear types, respectively (Chang et al., 2014). The proposed structure was completed by selecting TC of DANN structure. TC were number of delayed input, number of recurred output to input and number of neurons in the hidden layer, which were deliberated based on minimization of flood forecasting error (FFE) by trialand-error process (Equation 2) (Banihabib, Arabi, & Salha, 2015): where, RMSE a and REPE a are average of root mean square error and average of relative error of peak flow for return periods: 25-, 50-, 100-, 200-, 1,000-and 10,000-year. Number of delayed input (K), number of recurred output to input (H) and number of neurons in the hidden layer were determined by minimizing FFE as 37, 30 and 28, respectively (Banihabib et al., 2015). A DANN model forecasts flood using Equation (3): where, F is the function of DANN, and pi(t) and Q(t) are rainfall and flow data in t time step, respectively. K and H are number of delayed input and number of recurred output to input which are TCs of DANN.

Determination of FWLT
Next stage of the framework was to estimate the FWLT and expected lead-times (ELTs) of flood warning. FWFFS is able to reduce flood damages and protect lives if appropriately designed and performed. The main propose of FWFFS is to announce advanced warning of a flood. Accordingly, action plan may be applied to evacuate in risk areas (Pingel, Jones, & Ford, 2005). Specifically, the duration between the first forecastable or observed rainfall and the time at which the flood flow exceeds the threshold for a flood risk to life or properties at a critical location is the maximum possible warning time (Dotson & Peters, 1990). If a warning is known, the remaining time until the exceeded threshold is the forecast lead-time. Initially, time is required for system operators to gather, evaluate and forecast based on the available data. This data collection and evaluation time are considered as forecasted recognition time (T F ). The forecast lead-time (FWLT) is the difference of time between risk recognition (T F ) and the time of flow that exceeds the threshold limit (T E ).
FWLT estimated for different return periods i.e. 25, 50, 100, 200, 1,000, 10,000 years. As noted, trained DANN and calibrated HEC-HMS models were applied as part of the process. First, according to Iranian regulation for rivers, the minimum discharge for the starting flood damage which is equal to peak flow of 25-year flood (Q 25 ) was defined as the threshold flow for warning (TFW) (Standard and Technical Criteria Office, 1997). Second, to determine the rainfall for flood forecasting, a time step (dt) was selected. Classically, this value was nominated based on the time increment of the DANN and HEC-HMS models (in this case, it equals to one minute). In the next step, T F was set (duration of rainfall) equal to dt and the rainfall that occurs during that dt was applied for the flood forecast. Once the forecast was complete, the forecasted peak flow was compared to TFW. Then, if the forecasted flow was less TFW, the precipitation during T F was not enough to be risk for downstream of the watershed. Consequently, T F was increased by adding another dt. In the next step, the flood was forecasted again. This illustrated process was repeated until the peak flow was equal to or greater than the TFW. If the flow threshold exceeded, a warning should be issued and FWLT was determined. An expected lead-time (ELT) which uses (1/T) as weight of each FWLT was used for contracting the performance of the models in flood warnings.

Criteria
The performance of the models can be examined based on four criteria: type and quantity of the required input data by each model, flood simulation performance, time-length of FWLT and ELT. These criteria were used for contracting the models' capabilities in determining FWLT and ELT as follow: Type and quantity of required data for HEC-HMS and DANN models in calibration/training, verification and forecasting phases are easy-accessed data and less number is preferred for both phases. Therefore, availability and numbers of data are the indices used for assessment of the required data in calibration/training and forecasting phases. Flood simulation performance can be examined by error of peak flow (EPF) and root mean square error (RMSE). The former index is the indicator for the performance of model in the simulation of the peak flow of a flood. The recent index can be used to assess the ability of the simulated hydrograph in following the observed or target hydrograph. In addition, flood simulation performance can be tested by comparing simulated flood hydrographs with observed ones. Therefore, application of these indices and the comparison can denote the performance of the model in forecasting flood hydrographs. FWLT and ELT are used for assessing the performance of models used in FFFWS. These indices show the lead-time for warning in each return period and in average using HEC-HMS and DANN models.

Assessment of the models based on required input data
The review of the required data for calibration/training verification of and flood forecasting by HEC-HMS model and DANN model shows that DANN model requires more flood hydrographs data for verification than HEC-HMS model. As DANN model used five flood hydrographs of Figure 6((d)-(h)), HEC-HMS model just used one flood hydrograph of Figure 6(c). However, in the forecasting phase, DANN needs less data than HEC-HMS model ( Table 1). As described in part 3 of this paper, HEC-HMS model requires the watershed's physical characteristics such as land use data (as shown in Figures 1  and 2), area and slope of sub-basins, where DANN model does not requires those data. Whereas HEC-HMS model needs one observed hydrograph for calibration (as shown in Figure 6(a)), our tests show that DANN model has to use extreme flood, 10,000-year flood (as shown in Figure 6(b)), for training to be able to simulate other smaller floods. Moreover, the DANN model needs more simulated flood hydrographs (five flood hydrographs) for verification than HEC-HMS model. The reason for that is DANN model doesn't incorporate physical process of rainfall-runoff and should be verified for a range of floods to confirm its ability. On the other hand, HEC-HMS model needs watershed's physical characteristics for calibration, verification and flood forecasting phases, while DANN model does require only rainfall data for flood forecasting (Table 1). Consequently, required input data for the preparation of DANN model is more than HEC-HMS model, but its application for forecasting is easier than HEC-HMS.

Calibration/training and verification
Comparing trained hydrographs of DANN model and calibrated ones of HEC-HMS model shows that the trained hydrographs followed target hydrographs better than following observed ones by calibrated ones. Figure 6((a) and (b)) shows the comparison of the calibrated and trained flood hydrographs by observed and 10,000-year simulated floods, respectively. These figures illustrate that the DANN model training performed better than HEC-HMS calibration. Consequently, the results of the comparison of trained hydrographs by DANN model and calibrated ones by HEC-HMS with target and observed hydrographs demonstrate the better performance DANN in training than HEC-HMS in calibration phases.
Comparison of simulated flood hydrographs of the models with observed and target flood hydrographs denotes that HEC-HMS model was verified better than DANN model. Flood simulation performance of the models can be assessed by comparing the EPF and RMSE of the models. These indices show that the models perform conversely in training/calibration and verification phases. Table 2 shows these indices of the models for calibration and verification phases. Both indices are better for DANN model than HEC-HMS model in the calibration phase, whereas they

Assessment of the flood warning performance of the models
The results of the models show that FWLT and ELT were estimated longer by DANN than HEC-HMS model. Figure 7 indicates comparing FWLT of various return periods using DANN and HEC-HMS models. This figure shows that the FWLTs of DANN model were longer than HEC-HMS. However, the difference between FWLTs of DANN and HEC-HMS models declines by decreasing the return period.
Considering relatively better verification of DANN model for higher return periods, this difference can be considered fairly precise. ELT for DANN and HEC-HMS models was 12.7 and 10.8 min, respectively. Therefore, assessment of the flood warning performance of the models based on FWLT and ELT indicated longer estimation of these indices by DANN model than HEC-HMS model.

Conclusion
Flood forecasting is the core of a flood warning system which can be applied by either a CRR or BBRR model. HEC-HMS, a CRR model, is used widely for forecasting flood. Yet, its effectiveness in determining FWLT was required to be examined in steep urbanized watersheds. DANN can be used for forecasting floods as an innovative BBRR. However, its efficiency for determining FWLT of flash floods in steep urbanized watersheds was essential to be examined. Thus, the aim of this paper was to compare the efficiency of HEC-HMS and DANN for the determination of FWLT in the north of Tehran, as a steep urbanized watershed. The performance of the models can be inspected based on these criteria: type and quantity of required input data by each model, flood simulation performance and duration of FWLT and ELT, and the following major conclusions are derived: • The review of the required data for calibration/training, verification of and flood forecasting by HEC-HMS model and DANN model shows that DANN model requires more data for training and verification than HEC-HMS model. Conversely, in the forecasting phase, DANN desires less data than HEC-HMS model.
• The trained hydrographs of DANN model followed target hydrographs better than following observed ones by calibrated ones of HEC-HMS model.
• Comparison of simulated flood hydrographs of DANN and HEC-HMS models with observed and target flood hydrographs denotes that HEC-HMS model has better verification than DANN model. Furthermore, flood simulation performance of the models can be assessed by comparing the model based on EPF and RMSE indices which show that the models performed conversely in training/calibration and verification phases. DANN model acted better in the calibration phase, whereas HEC-HMS model performed better in the verification phase.
• FWLT and ELT were estimated longer by DANN than HEC-HMS model.
In conclusion, comparing CRR model, HEC-HMS, and BBRR model, DANN, shows that because of less required data for flood forecasting by BBRR model and its longer FWLT and ELT estimation, future research should be focused on better verification of this kind of model for flood forecasting.