Application of temporal convolutional network for ﬂ ood forecasting

Rainfall – runoff modeling is a complex nonlinear time-series problem in the ﬁ eld of hydrology. Various methods, such as physical-driven and data-driven models, have been developed to study the highly random rainfall – runoff process. In the past 2 years, with the advancement of computing hardware resources and algorithms, deep-learning methods, such as temporal convolutional network (TCN), have been shown to be good prospects in time-series prediction tasks. The aim of this study is to develop a prediction model based on TCN structure to simulate the hourly rainfall – runoff relationship. We use two datasets in the Jingle and Kuye watersheds to test the model under different network structures and compare with the other four models. The results show that the TCN model outperforms the Excess In ﬁ ltration and Excess Storage Model (EIESM), arti ﬁ cial neural network, and long short-term memory and improves the ﬂ ood forecasting accuracy at different fore-seeable periods. It is shown that the TCN has a faster convergence rate and is an effective method for hydrological forecasting.


INTRODUCTION
Floods endanger human lives, hinder sustainable socioeconomic development, and cause inestimable damage to densely populated areas in floodplains or downstream from major rivers (Hu et al. 2018). Precise flood forecasting can better reduce the risk of flooding and provide timely and efficient environmental information for management decisions (Le et al. 2019;Sahoo et al. 2019).
In the past decades, multiple flood forecasting models have been proposed. According to the difference in principle, models for flood forecasting can be divided into process-driven models based on physical mechanisms and data-driven models based on machine learning methods (Douglas-Mankin et al. 2010;Qin et al. 2018;Yuan et al. 2018). Commonly used process-driven models, such as the Xinanjiang model, have been regarded as common techniques for flood process simulation and forecasting (Beven et al. 1984;Zhao 1992;Wang et al. 2012). They usually require complex mathematical formulas, a large amount of hydrological and meteorological data, and an accurate understanding of runoff mechanisms. However, with the diversification of hydrological data and in-depth research on the mechanism of runoff generation and convergence, these process-driven models are restricted by many factors (Kratzert et al. 2018;Tian et al. 2018). This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

METHOD Temporal convolutional neural network
TCN, a type of convolutional neural network, is applied in many time-series forecasting tasks . It is deliberately kept simple, combining some of the best practices of modern convolutional architectures (Bai et al. 2018). TCN has two specific designs: 1D FCN architecture is used in TCN to keep the network output the same length as the input sequence (Long et al. 2015); The outputs are only influenced by information of the present and past inputs in each layer by using causal convolutions. Causal convolution is different from standard convolution in that the output at time t is not convolved with future values (Wan et al. 2019).
However, simple causal convolution still has the problem of traditional convolutional neural networks, and the modeling length of time is limited by the size of the convolution kernel (van den Oord et al. 2016). In this case, if you want to learn longer dependencies between data, you need to stack many layers linearly. To solve this problem, TCN uses one-dimensional expansion convolution. The difference between expansion convolution and traditional convolution is that it allows the input of convolution to have interval sampling. Without the pooling operation, this convolution increases the acceptable range of the network, so there is no loss of resolution.
The complete dilated causal convolution operation F over consecutive layers for a 1-D sequence of a given hydrology input q ¼ (q 1 0 , . . . , q 1 t ) and a filter f:{0, . . . , k À 1}, on element s of the sequence, is defined by the following equation: where d is the dilation factor, k is the filter size, and s À d Á i accounts for the direction of the past. As shown in Figure 1, the sampling standard is selected according to d, the first layer d ¼ 1 means that each number is selected, and the second layer d ¼ 2 means that the data are selected at intervals (Lara-Benitez et al. 2020). Choosing a larger filter size k or increasing the expansion factor d can obtain a larger network receptive field. Similar to the common dilated convolutions, d increases exponentially with the depth of the network layers, which allows the network to use a larger effective history while receiving each input. Another common method to further increase the network acceptance domain is to connect several TCN blocks. To avoid the deeper network structure from complicating the learning process, Figure 1 | TCN architectural elements, including a dilated causal convolution with dilation factors of d ¼ 1, 2, and 4 and filter size of k ¼ 3. The receptive field is able to cover all values from the input sequence. And a 1 Â 1 convolution (residual block) is added when residual input and output have different dimensions. The sampling standard is selected according to d, the first layer d ¼ 1 means that each number is selected, and the second layer d ¼ 2 means that the data are selected at intervals (Lara-Benitez et al. 2020). Choosing a larger filter size k or increasing the expansion factor d can obtain a larger network receptive field. To avoid the deeper network structure from complicating the learning process, a residual connection is added to the output (He et al. 2016). The residual connection uses 1 Â 1 convolution to ensure that the addition operation receives the same tensor (Bai et al. 2018). Hydrology Research Vol 52 No 6, 1457 a residual connection is added to the output (He et al. 2016). Since the input and output have different widths, the residual connection uses 1 Â 1 convolution to ensure that the addition operation receives the same tensor (Bai et al. 2018).

Flood forecasting model based on TCN
TCN is designed for time-series forecasting tasks. The rainfall-runoff process contains a series of time-series data, such as rainfall evapotranspiration and runoff. To explore the application of TCN in flood forecasting, this paper established a multi-step time-series forecasting model based on TCN for flood forecasting in the next 1-12 h. In TCN, a moving window scheme is used to create input and output pairs, which will be fed into the neural network (Bandara et al. 2020). In the prediction process at time t, the input data from time t À n to t are used to simulate the value at time t þ m, where n is the length of the input data, and m is the prediction time in the future. When the window slides to the next moment (time t þ 1), the value at time t þ m þ 1 is simulated using the data at time t À n þ 1 to t þ 1.
In this paper, the factors that affect the process of runoff generation and convergence, such as rainfall, evapotranspiration, and runoff data in the watershed, are selected as the input of the model. In addition, since the middle reaches of the Yellow River where the study area is located, the nature of the underlying surface changes significantly within the data time span, which affects the rainfall-runoff process in the basin. As a piece of data that can reflect the characteristics of the underlying surface, NDVI plays an indispensable role in simulating the relationship between rainfall and runoff. At the same time, deep learning requires a great deal of data as support, and only rainfall and flow data for input are not comprehensive. Although the relationship between rainfall and flow is the most obvious and direct, it is difficult for the model to grasp the subtle differences in the flood process. Moreover, predecessors have done similar studies. Therefore, we consider NDVI as another input to enhance the TCN's ability to learn data; flood flow data that needs to be predicted are used as output. These data constitute the dataset for training and testing the model.

Model setting and parameterization
The programming language of choice is Python 3.7, and the libraries used for preprocessing and managing our data are NumPy and pandas. We use the Google Keras deep-learning framework with TensorFlow backend and the NVIDIA RTX 2080Ti GPU to train the models.
First, the normalization operation is performed. Normalization is a method of rescaling the data in the original range so that all values range from 0 to 1. If the machine learning model uses the gradient descent method to find the optimal solution, dimensions are not uniform, and normalization is often necessary to allow convergence. For efficient learning, all input features (meteorological and hydrological variables) and output data (the discharge) are normalized by subtracting the mean value and dividing by the standard deviation. The output of the network is retransformed to obtain the final discharge prediction, which is an inverse data scaling process. The expression is described as follows: where x 0 and x represent the normalized result and sample data, respectively; m and s are the mean and standard deviation of the sample data, respectively. The purpose of this research is to establish a TCN-based deep-learning flood forecasting model to fully explore the ability of TCN to perceive hydrological data. Due to the high complexity of deep-learning models, finding the optimal TCN network structure and hyperparameters is a crucial task. Therefore, under the premise of different foreseeable periods, we conducted multiple experiments on the same dataset and combined them into models containing different network structures by changing the value of the TCN hyperparameters. The kernel size, filter, and residual block are selected as {4, 6, 8}, {32, 64, 128}, and {1, 2, 3}, respectively. In addition, the expansion factor is fixed to [1,2,4,8,16,32]. And the length of the input data is fixed to six, which means that a total of six time periods of current and past data are used to predict the future value. All these architectures are then tested for all combinations of the following parameters: the batch size and the epochs are selected as {32, 64, 128} and {20, 50, 100}, respectively. In general, we conducted (3 Â 3 Â 3 Â 3 Â 3 ¼ 243) experiments on the dataset in the research basin and found the best network structure and hyperparameter values under different forecast periods.

Performance evaluation criteria
In this study, the performance of different models is evaluated by statistical error measures, including the RMSE, NSE, and bias (Nash & Sutcliffe 1970;Hu et al. 2018;Kratzert et al. 2018). The mathematical expressions of these metrics can be described as follows: where Q 0 (m 3 /s) and Q c (m 3 /s) represent the discharge of the observed and simulated hydrographs, respectively; Q 0 is the mean value of the observed discharge, and n is the data point number.
NSE measures the ability of the model to predict variables different from the mean, gives the proportion of the initial variance accounted for by the model, and ranges from 1 (perfect fit) to À∞. Values closer to 1 provide more accurate predictions.
The RMSE is very sensitive to the maximum and minimum errors, enabling it to effectively reflect the accuracy of the prediction results. RMSE values closer to 0 provide more accurate predictions.
The bias can evaluate the accuracy of the overall water balance of the simulation results and range from À100 to 100%. A value close to 0 means more accurate predictions.

Model benchmarks and methods
We used several benchmarks to evaluate the performance of the TCN model. These benchmarks include LSTM, ANN, and EIESM (physical model). In addition, in order to explore the influence of NDVI as an input on the TCN model, we also added the TCN model without NDVI as a benchmark. The forecast period was set to 1-12 h, which is the premise of comparing the performance of different benchmarks. We simultaneously calculate and compare the performance evaluation indicators of all benchmarks.
LSTM is an RNN with a special memory unit structure (Hochreiter & Schmidhuber 1997). In the memory cell unit, the input information passes through the forget gate, input gate, and output gate in turn. Some information is selected to be forgotten, while other information is selected to be added to the memory, which overcomes the disadvantages of gradient explosion and gradient disappearance of traditional recurrent neural networks. Research in recent years has shown that LSTM is widely used in the modeling of time-series forecasting tasks in the hydrological field, such as flow forecasting and water quality forecasting assessment (Kratzert et al. 2018). LSTM was used as a benchmark in this study.
ANN is a black-box simplified model used to solve several water resources problems and can be trained with datasets to identify complex nonlinear relationships between inputs and outputs (Tokar & Johnson 1999). The representative feedforward neural network is composed of an input layer, hidden (containing neurons), and output layers. Recent studies have shown that using ANN is one of the most significant methods to simulate hydrological processes (Ahmad & Hussain 2019). ANN was also used as another benchmark in this study.
EIESM is a physical, improved Xinanjiang model (Hu et al. 2003). Compared with the Xinanjiang model, the excess infiltration runoff mode of EIESM is based on the infiltration curve and infiltration capacity distribution curve of the watershed, and the storage runoff mode is based on the water storage capacity distribution curve of the watershed. The two types of runoff modes are organically combined. Recent studies have applied the runoff simulated by EIESM within the range of acceptable accuracy, which is reflected by the goodness-of-fit measure (Wen et al. 2020). EIESM was used as a benchmark in this study.

CASE STUDY
The Yellow Riverthe fifth largest in the worldoften suffers from flood disasters. In recent years, a large number of water and soil conservation measures have been applied in the middle reaches of the Yellow River. This study selected the representative Jingle and Guxian watersheds. The first TCN model was developed for the Jingle watershed of the Fenhe River in Shanxi Province, a relatively small watershed that covers 2,799 km 2 . Jingle hydrological station is the primary stream control station on the upper Fenhe River and is located at 111°55 0 east longitude and 38°20 0 north latitude. The annual mean precipitation in the Jingle watershed is approximately 538.38 mm. Devastating frequent flooding in the last few decades has been widely researched.
An additional assessment was conducted in the Kuye watershed to discover if the proposed model architecture operates in different watersheds after training. The watershed covers 8,706 km 2 , spans Shanxi and Henan provinces, and is narrow with a long concentration time. The Wenjiachuan station is located at 110°45 0 east longitude and 38°26 0 north latitude. Annual precipitation in the two watersheds varies greatly, and both are severely affected by flooding. Figure 2 shows the locations of the Jingle and Kuye watersheds.
The underlying data for our study of the Jingle watershed include hourly discharge data from the Jingle station and hourly rainfall data from 14 gauges in the area. are used as the training set and 20 events (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)3,479 datasets) as the validation set. In this paper, a typical flood process, with a large volume flow and duration, is selected to verify the performance of the established model.

TCN network hyperparameter experiment
In this study, after experiments with different combinations of TCN models on the Jingle watershed dataset, we finally selected the best TCN model construction scheme based on the loss function (Loss) of the prediction results on the verification set for different forecast periods, including kernel size, filters, residual blocks, batch size, and epochs. Table 1 illustrates the best model construction in each forecast period.
As can be seen from Table 1, there is no obvious difference between the constructions of different forecast periods. The filters and kernel size are mostly selected as 128 and 8, respectively. The number of residual blocks always works best when the value is 2. These results are related to the characteristics of the given input hydrological data. TCN adjusts the receptive field by changing the above three hyperparameters, and the internal learning matches the rainfall-runoff process. The batch size remains unchanged at 128, and the epochs increase with the forecast periods. As the increases of time interval between output data and input data, the convolutional network requires more iterative epochs to capture the relationship in the data.

Understanding TCN in hydrology with model evaluations
In the above research, we discussed the optimal network structure and hyperparameters for TCN to model the rainfall-runoff process. To deeply explore and compare the hydrological process simulation performance of the TCN model, we used the other four models, including TCN (without NDVI input), LSTM, ANN, and EIESM models, for the same experiment and performed a simulation at a forecast period of 1-12 h as TCN. Table 2 illustrates the evaluation index of runoff forecasting at different forecast periods (1-12 h) by the four models. The changes in the three indicators reflect the flood simulation accuracy on the training and testing set. At the same time, in order to more intuitively reflect the performance of the model on the test set, we also calculate the evaluation indicators of all flood events on the test set. Figure 3 shows the Boxplots of different tests. It can be seen that the results of all models are closely related to the forecast periods, and the prediction accuracy decreases with the increase of the forecast period. In the modeling process, the forecast period represents the time interval between input and output data. A longer forecast period increases the difficulty of the prediction of the target value. In terms of flow formation reasons, future flow will be affected by current or earlier rainfall and other factors. The existing data gradually cannot provide an effective reference when it is hoped to obtain more distant future flow.
In the physical model, the results show that the NSEs of EIESM are 0.8848 (training set) and 0.8610 (testing set) at a forecast period of 1 h but fall to 0.7203 (training set) and 0.7106 (testing set) at a forecast period of 12 h. The physical models use the principle of rainfall-runoff to calculate the flow, but the overall accuracy is low due to incomplete consideration and systematic error. The results of ANN show that NSE varies from 0.9 to 0.6 for forecast periods from 1 to 12 h. The simulation effect shows a rapid downward trend with the increase of the foreseeable period. ANN is a relatively simple ANN and uses a backpropagation supervised learning technique for training. As a machine learning model proposed earlier, ANN cannot capture the information in the input required to process sequence data. Compared with the previous two models, LSTM, TCN, and TCN (NDVI) are well simulated and meet the needs of flood forecasting. It is evident that the forecastability of two TCN models is higher than that of LSTM at almost each forecast period, especially for long periods close to 6 h. The performance of the three models declined as forecast periods increased. For forecast periods of less than 6 h, the NSE of TCNs is higher than that of LSTM, and RMSE and bias are lower. When the forecast periods exceed 6 h, the prediction accuracy of LSTM sharply decreases, whereas the NSE of TCNs remains above 0.7; RMSE and bias remain below 100 and 40%, respectively. Finally, for the two different input TCN models, the latter model with NDVI input is better simulated, with higher NSE, lower RMSE and bias in all forecast periods. The input including NDVI better reflects the true characteristics of the watershed, so that TCN can learn the relationship between data more fully. At the same time, deep learning requires a large amount of data as support, so adding the NDVI sequence is beneficial for TCN fitting. Among five models, the proposed TCN model shows the highest accuracy compared to normal machine learning and physical models.
To evaluate the ability of the TCN model to forecast the flood process, of which the first three are the large flood events in the verification set, and the last one is the smaller and most common flood during the verification set. We use the previously mentioned model containing the best TCN structure and training hyperparameters to simulate the flood process. Figure 4 shows the observed and estimated hydrographic map of four flood events during the forecast period of 1, 6, and 12 h.
Flood events 1 and 2 were considered abnormal events during the verification set. The rainfall that formed the flood was high in intensity and short in duration, and the peak shape was sharp. At the forecast period of 1 h, the predicted value of the EIESM model for the peak and low tide section is higher than the actual value. ANN fluctuates significantly, especially before the flood peak occurs. However, the forecast runoff curves of LSTM and TCNs both fit the observed runoff curve well. They have strong predictive capabilities of backwater stages in good agreement with the actual process. In contrast, the flood peaks forecasted by TCNs are more realistic than those of LSTM, indicating that TCNs are more sensitive to rainfall and runoff processes. Although there is no obvious difference between TCN and the proposed TCN (NDVI), the latter fits the true value more accurately in the preceding section. At the forecast period of 6 h, the performance of all models has a certain Figure 3 | Boxplots of different tests. Each box is calculated from four realizations of model runs. It shows the average NSE for the prediction with different forecast periods of 1, 6, and 12 h. It can be seen that the results of all models are closely related to the forecast periods, and the prediction accuracy decreases with the increase of the forecast period. Among five models, the proposed TCN model shows the highest accuracy compared to normal machine learning and physical models. degree of degradation. Although EIESM expresses the correct flood process trend, the overall accuracy is insufficient. Moreover, the forecasting ability of ANN and LSTM deteriorates significantly, the phenomenon of underestimating flood peaks increase, and the simulated values fluctuate abnormally compared with observed values. ANN has large fluctuations and abnormal values, which may be due to the insufficient ability of ANN to learn long-term data. The flood peak flow forecasted by LSTM is later than the observed flood peak flow, which will seriously affect the flood warning. It is obvious that TCNs better simulate rainfall-runoff and forecast floods well and have higher accuracy than other models. Between the two TCN models, the model with NDVI input has a better grasp of where the flood peak appears and has less fluctuation. When the forecast period exceeds 6 h, due to the lack of hydrological data to form future runoff, the forecast flow curves of all models are much later than future observations. Even so, TCNs are minimally affected and the results are still practical.
The results also prove that considering NDVI as an input can effectively improve forecast accuracy. The last two floods have a lower peak flow than the first two events, which are normal flood events. Both figures show that the overall performance of EIESM is stable and insensitive to forecast period factors. Conversely, changes in the forecast period are likely to cause fluctuations in the neural network models. TCN (NDVI) best predicted the hourly peak flow, whereas the other models were insufficient to predict the values and had lower forecast accuracy. Figure 4 | Observed and estimated hydrographs of the five models at the validation stage (flood events 1, 2, 3, and 4) with different lead times of 1, 6, and 12 h. Flood events 1 and 2 were considered abnormal events during the verification set. And the last two floods have a lower peak flow than the first two events, which are normal flood events. The results show that TCN (NDVI) best predicted the hourly peak flow, whereas the other models were insufficient to predict the values and had lower forecast accuracy. Research Vol 52 No 6, 1464 Model application in a different watershed

Hydrology
To evaluate the practicality of the model structure, we applied the established TCN model to the Kuye watershed and selected the same length data as the previous experiment. The Kuye watershed is larger and narrower than the Jingle watershed and has longer travel time and different topography, soil type, and land use. TCN network modeling used hyperparameter combinations obtained from repeated experiments: the filter, kernel size, and residual blocks are selected as 128, 8, and 2, respectively, and the expansion factor is [1,2,4,8,16]. Table 3 shows the results based on different forecast periods for four models. In the verification set, NSE of the EIESM, ANN, LSTM, TCN, and TCN (NDVI) models are 0.8416, 0.9463, 0.9787, 0.9798, and 0.9819 at a forecast period of 1 h, respectively. With extended forecast periods, the simulation accuracy of all the models declined to varying degrees. Among them, EIESM and ANN have the most obvious downward trend. Although LSTM maintains a high level of prediction, it is always lower than TCNs. The performance of TCN without NDVI input still lags behind TCN (NDVI). The final TCN (NDVI) model consistently outperformed the other models. These results are the same as those in the Jingle watershed, confirming that the TCN (NDVI) model established in this study responds relatively smoothly to disturbed watershed attributes and can be used to make accurate predictions in multiple watersheds.

CONCLUSIONS
At present, more and more deep neural network methods are applied to rainfall-runoff prediction. On the one hand, with the advancement of technology, the means of obtaining data are more intelligent, and the types of data available are more diverse, such as soil evapotranspiration, wind speed, and pressure in the watershed. Some data that cannot be directly observed can be obtained by the inversion of remote sensing images. Progress in data requires a framework that can consider multiple factors to interpret the hidden relationships between data. On the other hand, the rainfall-runoff process contains many complicated steps. The runoff mechanism in semi-humid, semi-arid areas is more complicated than in humid areas (Le et al. 2019). The physical model is not enough to reflect the complete mechanism. Therefore, the deep neural network has become an effective new way of simulating rainfall and runoff. As a time-series prediction model based on convolution that has been proposed in the past 2 years, TCN has been successfully applied in some fields. This paper proposes a TCN learning model for rainfall-runoff prediction, which uses onedimensional convolution operation to process input and output sequences. The TCN model uses the observed rainfall data, evaporation, and NDVI data from rainfall stations in the basin as input, and the outlet section flow as output.
In the process of modeling, some hyperparameters in the structure of TCN neural network need to be considered, such as kernel size, filter size, and residual blocks. Inappropriate hyperparameter combinations will cause the prediction results to deviate from the true value. In each case of the forecast period of 1-12 h, 273 different hyperparameter combinations were tested in the Jingle watershed, and finally, the set of results with the best prediction effect was selected. Several domain characteristics are necessary for a successful application. First, the change of the best combination under different forecast periods is not obvious. The filter and kernel sizes fluctuate between 6, 8 and 64, 128, respectively, and the residual blocks remain at 2. In addition, some of the hyperparameters used for training, such as the batch size, are stable at 128. At the same time, TCN needs more iterations to complete the fitting in response to a longer forecast period. The characteristics of these hyperparameter combinations are considered to be useful for rainfall-runoff simulation.
In this study, we use the TCN network to model two watersheds. The experimental results show that the proposed TCN method can better simulate the rainfall-runoff process, has a better prediction effect than other models, and has less deviation when dealing with long-foreseeing prediction tasks. Traditional physical models and early machine learning models, such as EIESM and ANN, generally have low simulation accuracy. LSTM meets the forecast requirements under the short forecast period, but loses more accuracy with the prolongation of the forecast period, and significantly underestimates flood peaks and abnormal fluctuations. It is worth noting that the TCN model, including NDVI input, has a certain ability to reflect the characteristics of the underlying surface, which leads to better performance in the simulation on the selected watershed. This provides a new idea for the subsequent deep learning for hydrological forecasting tasks: inputting different types of data on the basin that have the potential to affect the flow. At the same time, the simulation in the Kuye watershed showed the stability of the proposed TCN model, and the selected hyperparameter combinations can be applied to a new watershed.
Using TCN to model the rainfall-runoff process can capture the relationship between hydrological data. In TCN, because the filter exists in each layer of the network, convolution can complete the prediction task in parallel, while RNN, such as LSTM, can only process information in one direction, that is, it needs to wait for the forward pass of the previous time step to be completed. The forward pass of the next time step and the deep LSTM network require a larger amount of calculation, resulting in a long time-consuming prediction task. In addition, the adjustable kernel size and expansion coefficient support the receptive field, which enables TCN to handle tasks with varying degrees of complexity. In the experiment, we found that TCN has a faster convergence rate than LSTM and occupies less memory, which is indispensable for prediction. These results also demonstrate the strong potential of applying deep-learning methods to other hydrological problems, specifically other time-series tasks.
The hydrological process contains complex and diverse variables. More variables that affect the target value can be used as input to enhance the deep-learning model's cognition of physical processes in the future, such as soil moisture and wind speed. In addition, deep neural network methods like TCN often have many hyperparameters. At present, we use repeated experiments to determine the best combination of them, which has limitations. If it can be combined with more intelligent parameter optimization methods, it will greatly improve the efficiency of obtaining the desired results.