Prediction of Water Level and Water Quality Using a CNN-LSTM Combined Deep Learning Approach

A Convolutional Neural Network (CNN)-Long Short-Term Memory (LSTM) combined with a deep learning approach was created by combining CNN and LSTM networks simulated water quality including total nitrogen, total phosphorous, and total organic carbon. Water level and water quality data in the Nakdong river basin were collected from the Water Resources Management Information System (WAMIS) and the Real-Time Water Quality Information, respectively. The rainfall radar image and operation information of estuary barrage were also collected from the Korea Meteorological Administration. In this study, CNN was used to simulate the water level and LSTM used for water quality. The entire simulation period was 1 January 2016–16 November 2017 and divided into two parts: (1) calibration (1 January 2016–1 March 2017); and (2) validation (2 March 2017 –16 November 2017). This study revealed that the performances of both of the CNN and LSTM models were in the “very good” range with above the Nash–Sutcliffe efficiency value of 0.75 and that those models well represented the temporal variations of the pollutants in Nakdong river basin (NRB). It is concluded that the proposed approach in this study can be useful to accurately simulate the water level and water quality.


Introduction
One of the main sources of the freshwater supply for the uses of domestic and industrial water and agricultural water are rivers. However, these water sources are often limited in many regions. The optimization of water resources management should take into account both quantity and quality. Not only optimizing water distribution to various sectors such as domestic, agricultural, and industrial sectors but also maintaining pollution levels within permissible limits is critical for optimization.
To predict surface water quality, process-based models such as the Soil and Water Assessment Tool (SWAT [1]) and Storm Water Management Model (SWMM [2]) have been widely used. For example, Baek et al. [3] improved the low-impact development module in the SWMM model to accurately simulate total suspended solids (TSS), chemical oxygen demand (COD), total nitrogen (TN) and total phosphorus (TP) in an urban watershed in the Republic of Korea (hereafter South Korea). Even though these conventional process-based models are capable of accurately simulating water quality, large input data and parameters that require high computational costs are often required. However, these datasets are not always available [4]. Furthermore, these limitations may become substantially larger for a river basin with complex hydraulic structures and various water uses, because input data and parameters for these all processes in a complex basin are practically not possible to obtain.

Water Level and Quality Simulation
In this study, CNN and LSTM network were combined for predicting the water levels and water quality concentrations, respectively. CNN and LSTM are the most common algorithms among deep learning (DL) models and have applied to the various fields (e.g., image recognition, transition, and speech analysis) [15,16]. Specifically, CNN has been developed to recognize patterns of image features [17], while LSTM has been widely used for identifying patterns in sequential data such as time series [18]. Figure 2 shows our CNN model architecture with two inputs having different shapes: multidimensional and single vector data. The multi-dimensional data consisted of the rainfall radar image with the dimension 251 × 141 (Figure 2a), while the single vectors were applied as the additional information such as the water level in the previous day, the averaged water level for the past three days, the temperature, the operation information of estuary barrage and the evaporation (Figure 2b). Based on assumption that the water level of the Nakdong River Estuary Barrage can influence the water level at the water level monitoring site (Figure 1), because the Nakdong River Estuary Barrage is closed to the site, water levels of one control structure (the Nakdong River Estuary Barrage) were used for this study. The CNN model consisted of three convolutional layers, two max pooling and two fully connected layers. The output image from the convolutional layer and single vector data were fed into a fully connected layer that converts a one-dimensional feature vector (Figure 2c) [19]. The output from the fully connected layer was the water level. The more detailed descriptions for each layer in CNN are found in Section 2.3. A schematic diagram for LSTM is shown in Figure 3. The input data of this model adopted the water level and water quality concentrations in previous time step. This structure comprised LSTM and fully connected layers. The output from the LSTM layers transferred a fully connected layer, resulting in generating the concentrations of the water quality. More details on the

Water Level and Quality Simulation
In this study, CNN and LSTM network were combined for predicting the water levels and water quality concentrations, respectively. CNN and LSTM are the most common algorithms among deep learning (DL) models and have applied to the various fields (e.g., image recognition, transition, and speech analysis) [15,16]. Specifically, CNN has been developed to recognize patterns of image features [17], while LSTM has been widely used for identifying patterns in sequential data such as time series [18]. Figure 2 shows our CNN model architecture with two inputs having different shapes: multi-dimensional and single vector data. The multi-dimensional data consisted of the rainfall radar image with the dimension 251 × 141 (Figure 2a), while the single vectors were applied as the additional information such as the water level in the previous day, the averaged water level for the past three days, the temperature, the operation information of estuary barrage and the evaporation (Figure 2b). Based on assumption that the water level of the Nakdong River Estuary Barrage can influence the water level at the water level monitoring site (Figure 1), because the Nakdong River Estuary Barrage is closed to the site, water levels of one control structure (the Nakdong River Estuary Barrage) were used for this study. The CNN model consisted of three convolutional layers, two max pooling and two fully connected layers. The output image from the convolutional layer and single vector data were fed into a fully connected layer that converts a one-dimensional feature vector (Figure 2c) [19]. The output from the fully connected layer was the water level. The more detailed descriptions for each layer in CNN are found in Section 2.3. A schematic diagram for LSTM is shown in Figure 3. The input data of this model adopted the water level and water quality concentrations in previous time step. This structure comprised LSTM and fully connected layers. The output from the LSTM layers transferred a fully connected layer, resulting in generating the concentrations of the water quality. More details on the LSTM layers are given in Section 2.4. Both CNN and LSTM models used the mean square error (MSE)

Convolutional Neural Network (CNN)
CNN recognizes the patterns to represent image features by utilizing the convolutional layers [17]. CNNs can receive images or a multi-dimensional matrix, and the neurons in CNN are connected to a smaller feature from the previous layer. This algorithm can reduce computations and prevent overfitting problems [20]. Therefore, CNN has been adopted in numerous studies focusing on the image objects from digital images [21]. A convolutional layer consists of the filter size, padding, and stride as the layer parameters [21]. The filter having a specific size (e.g., FH: filter height, FW: filter width) moves around the input image [22]. The padding inserts the zero values around the input image, which prevents the loss for the feature extraction [23]. The stride can define the step size of the filter in convolutions [24]. In each convolutional layer, the output size is calculated as:

Convolutional Neural Network (CNN)
CNN recognizes the patterns to represent image features by utilizing the convolutional layers [17]. CNNs can receive images or a multi-dimensional matrix, and the neurons in CNN are connected to a smaller feature from the previous layer. This algorithm can reduce computations and prevent overfitting problems [20]. Therefore, CNN has been adopted in numerous studies focusing on the image objects from digital images [21]. A convolutional layer consists of the filter size, padding, and stride as the layer parameters [21]. The filter having a specific size (e.g., FH: filter height, FW: filter width) moves around the input image [22]. The padding inserts the zero values around the input image, which prevents the loss for the feature extraction [23]. The stride can define the step size of the filter in convolutions [24]. In each convolutional layer, the output size is calculated as:

Convolutional Neural Network (CNN)
CNN recognizes the patterns to represent image features by utilizing the convolutional layers [17]. CNNs can receive images or a multi-dimensional matrix, and the neurons in CNN are connected to a smaller feature from the previous layer. This algorithm can reduce computations and prevent overfitting problems [20]. Therefore, CNN has been adopted in numerous studies focusing on the image objects from digital images [21]. A convolutional layer consists of the filter size, padding, and stride as the layer parameters [21]. The filter having a specific size (e.g., FH: filter height, FW: filter width) moves around the input image [22]. The padding inserts the zero values around the input image, which prevents the loss for the feature extraction [23]. The stride can define the step size of the filter in convolutions [24]. In each convolutional layer, the output size is calculated as: Water 2020, 12, 3399 where, OH is the height of output, IH is the height of input, FH is the height of the filter, SH is the height of the stride, OW is the width of output, IW is the width of input, PH is the height of the padding, PW is the width of the padding, FW is the width of the filter, and SW is the width direction of the stride [22]. In general, the convolutional layer needs the activation function to transform the signal from linear to non-linear. The rectified linear unit (ReLU) is employed as the activation function in this study. This function improves the computational speed and accuracy compared with the other activation functions (e.g., tangent sigmoid function) [25]. Especially, the ReLU function prevents the vanishing gradient problem by an exponentially decreasing the training gradient. The ReLU function is defined as: where, f(x) is the output of ReLU and x is the input signal. The max-pooling layer was used to extract the invariant features with an efficient convergence rate. This layer can eliminate the non-maximal values by the non-linear downsampling that can reduce the computational sampling during the CNN process [26]. The fully connected vector connects a loss function to calculate errors between the observed and simulated values by the vectorizing the input signal [27]. The MSE is used as a loss function in our study [28,29]. This calculates errors between simulated and observed values. The mathematical equation of the MSE is as follows: where, Y i is the simulated result, O i is the observed data, and N is the number of the dataset. The stochastic gradient descent (SGD) optimization was applied to train a CNN network. SGD optimizes the parameters of a CNN network by reducing the loss function, as: where, ϑ is the network parameter, x is the training dataset, N is the number of the dataset, and is the loss function.
The deep learning models such as CNN and LSTM require e an epoch number, a batch size, and a learning rate as the hyperparameters for the model training. The epoch number is the number of the learning in the entire training dataset, while the batch size is the number of samples that worked in the training at a time [30]. The learning rate is the step size at each iteration to minimize the loss function. In this study, the assigned epoch number and mini batch of CNN were 1000 and 16, respectively, and the applied learning rate was 0.001.

Long Short-Term Memory (LSTM)
The LSTM network is an extension of the recurrent neural network (RNN). RNN adopts a directed cycle structure that transfers the output of a hidden layer to the same hidden layer [31]. This structure can identify features of time-series by receiving the signal of the previous time. However, RNN has encountered the vanishing gradient problem, resulting in the unacceptable accuracy [18]. This vanishing gradient problem was overcome in LSTM suggested by Hochreiter and Schmidhuber [32]. The cell states can be updated by a gating regulation consisting of three different gates (forget gate, input gate, and output gate) and cells which are connected to each element. The following equations were used in LSTM: where, C t c is the cell state vector, a t−1 is the activation function at time step t, x t is the input at current step t, δ is an element-wise non-linear activation function, Γ i is the input gate, Γ f is the forget gate, Γ o is the output gate, and c t is a cell state at current step t. The bias and weight matrices are represented as b and W, respectively.

Performance Evaluation
The accuracies of the predicted water level, TN, TP, and TOC were evaluated using coefficient of determination (R 2 ), Nash-Sutcliffe efficiency (NSE) and mean square error (MSE). The equation of R 2 and NSE is defined as follows: where n is the number of datasets that have the water level (m), TN (mg/L), TP (mg/L) and TOC (mg/L), P i indicates the predicted results, P ı is mean of observed ones and O i represents the observed data, O ı is mean of observed ones.

Monitoring of Water Level and Water Quality
The results of the descriptive statistical analyses for the water level, TN, TP, and TOC are summarized in Table 1. In this study, the maximum values of water level, TN, TP and TOC were 1.19 m, 1.104, 0.003, and 2.100 mg/L, respectively, while the minimum values of those were 3.11 m, 4.383, 0.061, and 5.900 mg/L. The ratio of TN to TP (hereafter TN:TP ratio) was calculated using the minimum and Q2 values of TN and TP, respectively. The TN:TP ratio using those minimum values was 368 and that using the Q2 values was 150.62. The TN:TP ratio is an indicator of phytoplankton nutrient limitation [33]. These values were much higher than 22 of TN:TP ratio, indicating that NRB was in phosphorus-limited conditions [34,35]. The mean water level, TN, TP, and TOC were 1.65 m, 2.465, 0.021, and 3.202 mg/L, respectively. The median values of water level and TOC were close to the mean values of water level and TOC, while the median values of TN and TP were appreciably different from the mean values of those. The standard deviation of TN was the highest among the pollutants and TP had the highest coefficient of variation. Both statistics are commonly used to quantify the variation of data. However, the coefficient of variation is more proper to the comparison between the variations of each pollutant in that the coefficient of variation is useful to determine the variation of the independent data without considering the unit [36,37]. The water level of Q1 and Q3 were 1.60 and 1.69 (m), while the maximum water level was 3.11 (m), indicating that our data existed at the extreme value. This might be caused by the heavy rainfall that can provoke floods [38]. The validation set of water level and TOC smaller ranged than the training set of those, while these of TN and TP had a similar range to the training set. The standard deviation of the training set was larger than the validation set without TSS.  Figure 4 presents a comparison between the observed and simulated water levels. The simulated water levels by the CNN model showed good agreement with the observed water levels. The R 2 values between simulation and observation were 0.934 and 0.923 for the training and validation steps, respectively, while MSE between them were 0.001 and 0.001 (m 2 ) ( Table 2). The NSE values in the training and validation steps were 0.926 and 0.933, which is within the "very good" performance range (0.75 to 1) proposed by Moriasi et al. [39]. These values are in substantial agreement with those of Bustami et al. [40] and Panda et al. [41]. Bustami et al. [42] simulated water levels of the Bedup river in Malaysia using an artificial neural network (ANN) technique which resulted in an R 2 value of 0.92. Panda et al. [41] produced the water levels of the Mahanadi delta using MIKE and ANN, by showing to R 2 value of 0.921.

Water Level Simulation
The water level fluctuated in the rainy season that is from June to October, while the variation of the water level was low in the dry season. This can be explained by considering that the rainfall was one of the most influential factors to the water level in that the increment of rainfall increased the water level [42]. Specifically, in the rainy season in 2016, the water level showed 3.11 m that was the highest value for the entire study period. The highest rainfall (407.7 mm) occurred in September of the year 2017. This heavy rainfall could result in a higher peak flow [38]. The CNN model in this study well captured this phenomenon, indicating that this model can simulate extreme water levels. The simulated results also showed relatively higher water levels in the rainy season in 2017 which were very similar to the observations. The water levels between the end of September and early October in 2016 were much higher than those for the same period in 2017. One possible explanation is that the period in the year 2016, typhoon Chava-one of the strongest tropical cyclones that made landfall in South Korea-had a great impact on the Korean peninsula with a large amount of precipitation.   Figure 5 shows the comparison between the observed pollutant values and the simulated results of the LSTM model. The R 2 of TP and TN for the training period were 0.92 and 0.95, respectively, while those in the validation period were 0.87 and 0.97, respectively (Table 2). TOC had the lowest R 2 value among the pollutants for both of the training and validation periods, with 0.86 and 0.79, respectively. The MSE values for TOC, TN and TP for the training period were 1.37 × 10 −5 0.017 and 0.055 respectively, while those in the validation period were 2.08 × 10 −5 , 0.010 and 0.041, respectively. The NSE values of the LSTM model for both of the training and validation periods were above 0.75 which is within the "very good" performance range (0.75 to 1) in all the pollutants (e.g., TOC, TN and TP) [39]. As shown in Figure 5, the LSTM model in this study well simulated the temporal variations of those pollutants. Since these temporal variations may result from pollutant transport characteristics, this result implies that the LSTM model can properly reflect the transport characteristics of each pollutant. These temporal variations have been well simulated in previous studies. For example, Zhang et al. [43] predicted the temporal variations of DO in the Burnett river   Figure 5 shows the comparison between the observed pollutant values and the simulated results of the LSTM model. The R 2 of TP and TN for the training period were 0.92 and 0.95, respectively, while those in the validation period were 0.87 and 0.97, respectively (Table 2). TOC had the lowest R 2 value among the pollutants for both of the training and validation periods, with 0.86 and 0.79, respectively. The MSE values for TOC, TN and TP for the training period were 1.37 × 10 −5 0.017 and 0.055 respectively, while those in the validation period were 2.08 × 10 −5 , 0.010 and 0.041, respectively. The NSE values of the LSTM model for both of the training and validation periods were above 0.75 which is within the "very good" performance range (0.75 to 1) in all the pollutants (e.g., TOC, TN and TP) [39]. As shown in Figure 5, the LSTM model in this study well simulated the temporal variations of those pollutants. Since these temporal variations may result from pollutant transport characteristics, this result implies 9 of 13 that the LSTM model can properly reflect the transport characteristics of each pollutant. These temporal variations have been well simulated in previous studies. For example, Zhang et al. [43] predicted the temporal variations of DO in the Burnett river using the PCA-RNN model with the R 2 value of 0.908. Choubin et al. [44] used the CART model to simulate the suspended solids in the Haraz River with an R 2 value of 0.67. These studies focused on simulating a single pollutant, while our study simulated the concentrations of multiple pollutants (i.e., TOC, TN and TP).

Water Quality Simulation
The fluctuations of temporal variations in TOC and TP were higher in the rainy season (June to October) than those in the dry season. This can be explained by considering the rainfall patterns in South Korea. Most of the precipitation in South Korea falls in the summer monsoon season (June to September). TOC and TP were easily washed off by the rainfall resulting in higher concentrations of these pollutants in the rainy season [45,46]. Schrumrf et al. [47] demonstrated that TOC increased with rainfall. Park et al. [48] also showed that TP was higher in the rainy season compared with the dry season. However, the patterns of temporal variations for TN were different from the two pollutants. The TN concentrations increase in the period from February to June. We surmised that the nitrogen fertilizer application contributed to this increase. The fertilizers in South Korea are usually applied in spring and contain a large amount of nitrogen [49][50][51]. NRB has a broad agricultural area that can influence the variation of TN. Karlen et al. [52] reported that higher TN in water was generally found after fertilizer applications. The fluctuations of temporal variations in TOC and TP were higher in the rainy season (June to October) than those in the dry season. This can be explained by considering the rainfall patterns in South Korea. Most of the precipitation in South Korea falls in the summer monsoon season (June to September). TOC and TP were easily washed off by the rainfall resulting in higher concentrations of these pollutants in the rainy season [45,46]. Schrumrf et al. [47] demonstrated that TOC increased with rainfall. Park et al. [48] also showed that TP was higher in the rainy season compared with the dry season. However, the patterns of temporal variations for TN were different from the two pollutants. The TN concentrations increase in the period from February to June. We surmised that the nitrogen fertilizer application contributed to this increase. The fertilizers in South Korea are usually applied in spring and contain a large amount of nitrogen [49][50][51]. NRB has a broad agricultural area that can influence the variation of TN. Karlen et al. [52] reported that higher TN in water was generally found after fertilizer applications.

Conclusions and Future Work
In this study, we combined the two deep learning models (CNN and LSTM) to simulate the water level and the three water quality parameters (TN, TP and TOC) in NRB. Among the deep learning models, the CNN model was adopted to simulate the water level, while the LSTM model was selected to simulate the concentration of the pollutants. We found the following in this study: (1) The water level from the CNN model produced the NSE value of 0.933 that can be regarded as acceptable model performance. The water levels increased in the rainy season, while those were low in the dry season. This study suggests that the combined approach of the two deep learning techniques proposed in this study has promise as a tool in accurately simulating the water level and water quality and that this approach can contribute to developing effective strategies for better water sustainability and management. Although our model showed the acceptable model performance, only the three different pollutants were investigated in this study. However, most process-based models can simulate a lot more water quality including the three pollutants (e.g., chlorophyll, algae, dissolved oxygen, and fecal bacteria). A further study is recommended to develop deep learning models so that more pollutants including chlorophyll, algae, dissolved oxygen, and fecal bacteria can be simulated. In addition, further study on the deep learning model with "visual explanations" is required, such as Gradient-weighted-Class Activation Mapping (Grad-CAM) [53] and CAM [54], because the deep learning model is a black-box model that has general difficulty in identifying physical features. In addition, the approach outlined in this study should be replicated with other datasets.