INTRODUCTION

Floods are among the most destructive natural disasters both in Russia and in the World, and their forecasting is a key factor in the protection of the population and economy from floods. In Russia, flood casualties occur every year, and since the early 2000s, more than 80% of people were killed by floods of rain origin and more than 50%, by flash floods [2]. To ensure the forecast advance time required for emergency services to respond and to alert the population in the case of flood hazard, the system of hydrological forecast is to be developed basing on monitoring data, up-to-date forecast methods, and skilled staff [6, 29].

Under current climate changes, the frequency and the scale of hazardous meteorological phenomena, including extreme storm rainfalls, are expected to increase in many regions of the world, thus making the problem of flood forecasting extremely hard at any time scale [22, 25, 27]. In Russia, the contribution of heavy showers to the total precipitation has been also increasing in the recent years [21].

The progress in forecasting flash floods can be engaged through increasing the availability of data (from gaging stations, the population, and remote sensing) and methods of their effective use, with the aim to improve the understanding of hydrometeorological phenomena and their socioeconomic effects [38]. In the Krasnodar Krai, a unique for Russia regional automated monitoring system for flood situation (AMSFS KK) has been operating since 2013, including more than 250 automated hydrological complexes (AHC) and more than 90 automated meteorological stations (AMS). Automated gages measure water level in rivers with a frequency of once in 10 min, along with some other weather characteristics, and transmit the data on a real-time basis to the information server of the system through cellular communication channels based on technologies of Emercit company [12]. Advantages of the system are the high density of the monitoring network (<170 km2 per 1 AHC in the mountain part of the territory), the high frequency of data transmission, and their open-access in the Internet, emphasizing its significance in ensuring online monitoring of floods and alerting the population about floods. AMSFS KK fits well into the international practice of developing local hydrometeorological systems for solving monitoring problems and flash floods forecasting [46]. At the same time, the array of high-frequency data acquired by the regional monitoring system can be used to develop methods for simulating and forecasting flash floods. Notwithstanding the relatively short operation period of the Emercit system, measurement results are being actively used by researchers to study hydrological and geomorphological phenomena [5, 13, 14].

The achievements in the field of artificial intelligence (AI) and machine learning (ML) [23, 31, 32, 40] provide promising methods for incorporating the full potential of large datasets for the analysis and forecasts with the aim to provide accurate and timely data to stakeholders, the population, and the authorities.

The problems of short-term hydrological forecasting often require not detailed description of the processes of runoff formation but an accurate description of the response of the river system to meteorological forcing data [17]. This is done with the use of simplified models relating previously observed weather and river flow with the future river flow by effectively incorporating the (auto)correlation of time series [4].

The problem of forecasting water levels in the mountain rivers in Krasnodar Krai is of great importance because of the frequent floods. Water levels are characteristics that can be measured more accurately and easily than water discharges, especially in mountain rivers, and forecasting water levels at a downstream gage of a river reach, considering the observations at the upstream gage is a common hydrological task. The objective of this study is to construct machine learning models for forecasting water levels in the rivers of Pshish (at Guriiskaya Vil.) and Mzymta (at Kazachii Brod Vil.) in Krasnodar Krai, using data on water levels at upstream gages. For the first time we use initial dataset containing high frequency water level observation data on the AMSFS KK network. We employ the three widely used ML model architectures for model construction, i.e., decision tree regression model M5P [37], gradient boosting of decision trees XGBoost [20], and an artificial neural network based on multilayer perceptron (hereafter, MLP [24]). The forecasting effectiveness is tested for the forecast lead time from 1 to 20 h.

STUDY OBJECT

The catchments of the Pshish and Mzymta rivers (Fig. 1) lie on different macroslopes of the Western Caucasus. These rivers were chosen because of their similar catchment sizes (Table 1) and contrast natural conditions in their catchments. The Pshish River is a tributary of the Kuban in its middle reaches; it contributes to the Krasnodar Reservoir. The Mzymta basin lies in the southern Russian part of the Black Sea coast at the Caucasus. This river flows into the Black Sea. The upper reaches of the Mzymta lie further westward in a higher part of the Greater Caucasus; the maximum elevation in the basin is >3250 m, while that on the ridges of the Pshish River catchment is as little as 1800 m. The relief and elevation of the catchments determine their climate features, snow accumulation regime, and the water regime of rivers.

Fig. 1.
figure 1

Schematic map of the catchments of (1) the Pshish and (2) Mzymta rivers and the layout of the chosen AHCs.

Table 1. Characteristics of the Pshish and Mzymta rivers and their catchments for the chosen gages of AHC Automated Monitoring System for Flood Conditions in Krasnodar Krai

The Pshish River catchment lies in the zone of temperate climate with annual precipitation >900 mm (with 950 mm in Khadyzhensk T., and 1660 mm in Gornyi Vil.), the snow cover is intermittent. The Mzymta R. basin lies in the zone of humid subtropical climate with an annual rainfall of >1300 mm (1377 mm in Adler City, 1795 mm in Krasnaya Polyana Town) [10]. Part of the catchment above 600–1300 m belongs to the zone of seasonal snow cover, where snow falls every year and lies during >120 days, thus enabling the operation of ski resorts on the slopes of the Mzymta valley.

According to P.S. Kuzin’s classification [7], the Pshish is a river with rain floods occurring all year round, though more frequent in the cold season, while the Mzymta is a river with spring–summer freshets and rain floods in all other seasons. The mean maximal level increase in the Pshish R. is 2.9 m at AGK-95 and 6.1 m at AGK-50. The mean rate of level rise during freshets is 0.1–0.6, and maximum of 1.1 m/h. The concentration time between the peaks of high floods in the Pshish River varies from 18 to 28 h, which corresponds to the flood wave propagation velocity of 3−4 m/s in this river reach.

The range of level variations in the Mzymta R. during flood events is 2–3 m. The rate of level rise is commonly relatively low in the upper reaches (0.05–0.24 m/h), while that in the lower reaches is higher―up to 0.5 m/h. In the lower reaches, flood formation often begins earlier because of damp air moving from the sea [1]. Peak levels can also occur earlier in the lower gage. A definitive case is the flood of June 25, 2015, when showers caused floods on rivers in the coastal zone and in low mountains; at Kazachii Brod, the flood peak passed at 1 p.m., and air masses rose into the mountain part a day later and formed a flood peak at the confluence of the Mzymta and Pslukh rivers at 10 a.m. on June 26.

Notwithstanding the relatively short operation period of AMSFS KK, its sensors have already recorded extreme water level rises, during the flood on October 24–25, 2018, the level rise in the Pshish R. was >6.2 m near Navaginskoe Vil. and 9.3 m at Guriiskaya Vil. According to data of Roshydromet gages for Khadyzhensk T. and Bzhedukhovskaya Vil., this was the highest flood over the observation period since 1936. On the same dates of October 24–25, 2018, a high flood was developing in the Mzymta R. with level rises by 2.0–3.1 m upstream of the confluence with the Pslukh R. and at Kazachii Brod Vil. In the area near Kazachii Brod, this flood ranked second in magnitude after that of October 25, 2003 (or third, taking into account the highest flood recorded on the gage at Kepsha Station on June 26, 1956) during the period of instrumental observations.

Available Hydrometeorological Data

In the chosen rivers, observations on AMSFS KK network are carried out in parallel to observations at Roshydromet gages, thus enabling mutual quality control of the obtained data. The regional AMSFS KKnetwork includes 3 AHCs on the Mzymta R. and 3 AHCs on its tributaries. On the Pshish River and its tributaries, 6 and 2 AHCs are now in operation, respectively. In this study, two pairs of AHC AMSFS KK were chosen at each river with the most complete observation series (with few gaps and outliers) observation series (Table 1). The distance between those gages is 86.5 km along the Pshish R. and 40 km along the Mzymta R.

Processing Data of Water Level Measurements, Removal of Spikes, and Filter Tuning to Reduce Noise in the Series

The study uses the archive dataset of 10-min water level measurements at AHC-95, AHC-50, AHC-89, and AHC- 160 from April 2014 to January 2021.

Due to the design features of the measuring radar sensor and high measurement frequency the time-series are noisy and contain numerous outliers. The majority of the existing procedures used to analyze time series for outliers, spikes, or anomalies are unsuitable for the analysis of water level variations in small mountain rivers because of the specifics of their hydrological regime. The abrupt and fast water level rises, typical for West Caucasian rivers, analyzed with the use of statistical tests known to the authors (Chauvenet’s statistical criterion, Grubbs’s test, Pearse’s test, Dixon’s Q-test) rejected the null hypothesis assuming that the values considered are not outliers. Therefore, the following algorithm has been developed to remove such values caused by measurement errors. First, the regular measurement interval of 10 min was introduced. Next the series was analyzed to identify outliers through revealing events in which water level rose by >1.5 m within 10 min and next dropped by >40 cm. The proportion of outliers at selected AHC varied from 0.05 (18–25 values, associated with sensor calibrations, according to data of JSC Emersit) to 0.24%. Within the 7 years of sensor operation, the proportion of gaps varied from 2.29 to 7.77% of series length. The identified erroneous data were removed, and the gaps were filled by linear interpolation.

After the determination of outliers and the restoration of a regular measurement step, the data were filtered to remove high-frequency noise, typical for this kind of data. The methods of linear and nonlinear filtration [39] are not suitable for this problem, because they largely reduce the actual range of level variations during floods. Therefore, this was made by Savitzky–Golay smoothing method [42], which is widely used to process high-frequency observation data in various fields [28, 45]. A distinctive feature of this filter is the ability to smooth noises and to preserve the actual variation amplitude [36]. The smoothing is carried out using 41 samples wide rolling window with 8th degree polynomial. It is followed by linear interpolation aimed to fill in the small gaps <8 h that form due to the removal of outliers and faulty measurements. This is necessary to obtain the most complete series for further modeling.

Preparation of Dataset for Experiments

Matrices of data from 2014 to 2020 were prepared for computation experiments. The calculations were made for a step of 1 h; therefore, before their use, the data were converted to a one-hour step. For each hour, water level was forecasted with some lead time based on the data available at the moment of forecast. The datasets included a set of 54 predictors, i.e., water level series obtained by processing the initial AMSFS KK data. The same dataset contained 8 predicates―series of the forecasted characteristic (water level) with a lead time from 1 to 20 h (the future water levels Lt+1, Lt+2, Lt+3, Lt+5, Lt+10, Lt+15, Lt+18, Lt+20).

The predictors included:

water levels at the upstream gage Ht-τ (m) at the moment of forecasting and for the previous period τ = 1…48 h;

water levels at the downstream gage Lt-τ (m) at the moment of forecasting and for the previous period τ = 1…24 h.

The construction of this dataset allows to account for flood wave routing time along the river channel. The initial datasets are arranged by rows. The data in the first position of each row are the date and the hour t for which the experimental forecast is made.

Next the series of water levels were divided into the training and testing samples according to the recommendations [18] with a sufficient length and similar statistical characteristics. The training period for the Pshish River was chosen from March 3, 2015, to August 31, 2018 (70% of the data), and the testing period was from September 1, 2018, to January 5, 2020 (30%). The respective periods for the Mzymta were from January 1, 2014, to August 31, 2018 (teaching, 80%) and from September 1, 2018, to January 5, 2020 (check, 20% because of gaps).

THE MODELS AND THE METHODS OF THEIR CONSTRUCTION

Description of M5R Model

The algorithm M5R is based on the algorithm of construction of regression model trees M5 [37]. This algorithm includes three stages. At the first stage a decision tree is constructed, where the splitting criterion minimizes the variations of predicate values in each branch. At the second stage, truncation is made at each leaf. At the truncation, the inner nodes of the tree form a regression plane. At the third stage, to prevent the formation of breaks, a smoothing procedure is applied. It combines the forecasts of the model in a leaf with the values for each leaf toward the root of the decision tree. An advantage of the algorithm is its compactness and the relatively simple interpretation of the obtained models. The algorithm uses unscaled initial data.

Description of XGBoost Model

XGBoost is gradient boosting of decision trees. Boosting is a technique that helps successively constructing the composition of algorithms; in this case, each new algorithm is constructed in order to minimize the errors of the previous one. The base algorithms of the composition are shallow decision trees.

Suppose we have a composition of N – 1 decision trees:

$${{a}_{{N{\kern 1pt} - {\kern 1pt} 1}}}(x) = ~\mathop \sum \limits_{n{\kern 1pt} = {\kern 1pt} 1}^{N{\kern 1pt} - {\kern 1pt} 1} {{b}_{n}}(x).$$
(1)

At the next step, we have to add an algorithm \({{b}_{N}}(x)\) that will decrease the composition error on the learning sample. In other words, we have to determine which values \({{s}_{1}}, \ldots .,{{s}_{t}}\) are to be taken by algorithm \({{b}_{N}}({{x}_{i}}) = {{s}_{i}}\) on the objects of the learning sample for the error of the entire composition to be minimal:

$$s = \mathop {{\text{argmin}}}\limits_r E(r) = \mathop {{\text{argmin}}}\limits_r \mathop \sum \limits_{i{\kern 1pt} = {\kern 1pt} 1}^l L({{y}_{i}},{{a}_{{N{\kern 1pt} - {\kern 1pt} 1}}}({{x}_{i}}) + {{r}_{i}}).$$
(2)

Now, the vector s can be sought for as the antigradient of the error function:

$$s = - \nabla E.$$
(3)

Therefore, training the algorithm \({{b}_{N}}(x)\) is the minimization of the quadratic error function of algorithm responses and elements of vector s.

The model is written in Python language. Library xgboost, version 0.90, was used for the construction. Library sklearn was used to validate the model. The input data for the model were standardized variables within the range [–1; 1].

Description of MLP Model

The neural network model commonly used for hydrological problems [16, 33, 43] is multilayer perceptron MLP [24]. This is a computation network consisting of artificial neurons, simulating brain neurons. A mathematical model of neuron was first proposed in [30], however, it started rapidly developing only after the introduction of backpropagation algorithm [41] for evaluating model parameters (the weights of the input variables). Multilayer perceptron model is based on the use of a single or several layers of artificial neurons, which receive many observation data series on their inputs (for example, water levels with different time shifts) and transform them into an output signal (for example, water levels with a specified lead time) through nonlinear transformations and combinations of input signals with various weights. Mathematically, the artificial neuron can be represented by the dependence

$$y = ~\,\,{{\varphi }}\left( {\mathop \sum \limits_{i{\kern 1pt} = {\kern 1pt} 1}^n {{w}_{i}}{{x}_{i}} + b} \right),$$
(4)

relating the output signal y with a vector of input data \({{x}_{i}}\) (i = 1, 2, …n), which enter the model with different weights \({{w}_{i}}\), and neuron activation threshold b via activation function \({{\varphi }}\). The activation function is commonly represented by a nonlinear sigmoid function of the form:

$${{\varphi }} = \frac{1}{{1 - {{{\text{e}}}^{{ - Sx}}}}}.$$
(5)

The weights with which the vectors of source data enter the artificial neural network were determined based on the comparison of the output signal with the learning vector (observation data) and back propagation, which leads to repeated weighing. This process is referred to as learning a neural network. The learning process is to minimize the chosen error function, which is commonly the root-mean-square deviation (though any other function can be chosen).

The nonlinear relationships established between ANN input and output, as well as the use of any other input data for automated search for model structure allows ANN to be used in various problems, not limited to hydrology. In terms of the structure of hydrological models, ANN is a classical black box model [19]. Such model does not require a detailed description of the system to be simulated (in our case, the processes of runoff formation in a river basin) for identifying relationships between its inputs and outputs. An advantage of ANN, as applied to complex nonlinear systems, over models with distributed parameters is their less strict requirements to input data. The initial data for the model are standardized characteristics varying within the range [–1; 1].

The process of ANN model development consists of the following. The measurement data used to develop the model are divided into two groups: training and testing. First, the ANN model is trained on the first set of data, and the simulation result is compared with observation data to evaluate simulation quality. Next, the model is tested on an independent dataset under a specified condition regarding the admissible decrease in model quality. The model thus developed and trained can yield an output signal whatever the amount of input data.

The model for forecasting water level variations based on measurement data from AHC AMSFS KK was developed with the use of software components of scikit-learn library of Python language [35]; the regression problems with training were solved with the use of MLPRegressor software. The main parameters of the model are the number of its hidden layers, the number of neurons in them, the form of the activation function, the algorithm used to search for the global minimum of the optimization function, and the maximal number of iterations. The models’ hyperparameters were optimized by grid search algorithm with ten-fold cross-validation (the training data set is randomly divided into 10 parts, of which 9 are used for parameter tuning and 1 is used for testing; next the dataset is divided in another way; this test is repeated 10 times). These procedures are also implemented as functions of scikit-learn library.

The Methods Used to Evaluate Model Quality

We used the following metrics to evaluate the models’ performance. The root-mean-square error of the simulated values from the observed was calculated using the formula

$$S = \sqrt {\frac{{\sum\limits_{i{\kern 1pt} = {\kern 1pt} 1}^N {{{{(H_{i}^{{{\text{sim}}}} - H_{i}^{{{\text{obs}}}})}}^{2}}} }}{N}} ~,$$
(6)

here \(H_{i}^{{{\text{sim}}}}\) is the simulated water level, m; \(H_{i}^{{{\text{obs}}}}\) is the observed water level; N is the length of the series.

The Nash–Sutcliffe model efficiency coefficient NSE [34]:

$$NSE = 1 - \frac{{\sum\limits_{i{\kern 1pt} = {\kern 1pt} 1}^N {{{{(H_{i}^{{{\text{sim}}}} - H_{i}^{{{\text{obs}}}})}}^{2}}} }}{{{{{\sum\limits_{i{\kern 1pt} = {\kern 1pt} 1}^N {(H_{i}^{{{\text{obs}}}} - \overline {{{H}^{{{\text{obs}}}}}} )} }}^{2}}}}~.$$
(7)

The Nash–Sutcliffe coefficient NSE vary within the range \(\left( { - \infty ;1} \right]\), at NSE > 0.75, the simulation quality is excellent, at 0.36 < NSE < 0.75, it is satisfactory; and at NSE < 0.36, unsatisfactory.

The quality of the tendency forecast of the models was evaluated by the characteristic S/σΔ [4, 9], where σΔ is the deviation of the inertial forecast for each lead time:

$${{{{\sigma }}}_{{{\Delta }}}} = ~\sqrt {\frac{{\sum\limits_{i{\kern 1pt} = {\kern 1pt} 1}^N {{{{\left( {{{\Delta }_{i}} - {{\bar {\Delta }}}} \right)}}^{2}}} }}{{N - 1}}} ,$$
(8)
$${{\Delta }_{i}} = H_{i}^{{{\text{obs}}}} - H_{{i{\kern 1pt} + {\kern 1pt} {{\tau }}}}^{{{\text{obs}}}},\,\,\,\,~{{\tau }} = 1 \ldots 20.$$
(9)

The values of SΔ vary within the range \(\left[ {0; + \infty } \right)\), with the values of 0–0.5 implying excellent; 0.5–0.7, good; 0.7–0.8, satisfactory; and >1, unsatisfactory quality of the forecast procedure.

It is worth mentioning that the metrics given above are commonly used to evaluate the quality of simulation and forecasting water discharges; however, they can be used for water levels as well.

RESULTS

In the machine learning models that incorporate linear regression, in our case, M5P and XGBoost, the degree of association of predictors and the predicate often determines the quality of modeling and can be evaluated beforehand through (auto)correlation analysis of variables. For the chosen pairs of AHC in the examined rivers, we estimated the correlation between water levels at a downstream AHC with different lead times and the levels at an upstream AHC with different time shift. In the case of the Pshish River, the correlation between the downstream AHC-95 and the upstream AHC-50 was highest―up to 0.93―at the lead time of 15–20 h (Fig. 2a). Similarly, the travel time, determined by the time of passage of appropriate high flood peaks in this reach, as mentioned above, is ~18 h. The autocorrelation at AHC-95 at an lead time up to 8 h, never falls below 0.95 and that at an lead time up to 13 h, below 0.90 (Fig. 2b). In the case of the Mzymta, because of the considerable changes in the drainage area between AHCs and a large lateral inflow, the correlation between the levels at the downstream AHC-160 and the upstream AHC-89 is not high (Fig. 2c): it is not greater than 0.74 for the lead time of 1 h and starts decreasing after 5 h. At active snow melting in the Mzymta R. catchment, the travel time of the wave of the daily level variation in the reach under consideration is 2–3.5 h. Conversely, the autocorrelation between the hourly levels of AHC-160 is relatively high: it is 0.97 below 1 h and 0.90 below 17 h (Fig. 2d). The obtained estimates correspond to the travel time between AHCs.

Fig. 2.
figure 2

Corellogram of water levels in (a, b) the Pshish and (c, d) Mzymta rivers between the upstream and downstream gages and (b, d) autocorellogram at the downstream gage.

Grid search in the models’ hyperparameter space with ten-fold cross-validation on the training sample was used to determine optimal parameters and structures of the models and to evaluate modeling quality criteria with lead times of 1, 2, 3, 5, 10, 15, 18, and 20 h (Table 2). The structure of the classification tree for model M5R became more complex with an increase in the lead time, though the interpretation of the chosen model configuration is available for any lead time in the form of a problem-solving graph (as, for example, in Fig. 3 for the lead time of 1 h). In the process of XGBoost model tuning, possible combinations of parameters were tested: the number of trees \( \in \)[100,300,500,800], the maximal depth of trees \(~ \in \) [3, 5, 8]. The best results were obtained with a model with 1000 trees with depth ≤5. In MLP model for the lead time of 5 h, a structure with one layer containing 10 neurons was used, while at another lead time, the structure is more complex―two or three layers each containing 10 neurons.

Table 2. Estimates of simulation quality of water levels in the Pshish R. at Guriiskaya Vil.
Fig. 3.
figure 3

Decision tree of M5P model for water level forecast in the Pshish river at Guriiskaya Vil. at lead time of 1 h.

For the Pshish River, the best result in terms of NSE values was obtained for all models on the training and test samples with the lead times of 1–5 h. In the case of lead times less than 5 h, the root-mean-square error of simulation S ≤ 0.2 m. In the test period, the error S increases faster than in the calibration period and amounts to 0.25–0.41 m. As the lead time increases, the quality estimates in the form of NSE criterion steadily decrease, though remaining greater than 0.8, which corresponds to an excellent result of simulation. Tests based on independent data show that the best estimates were given by models M5P and XGBoost, in the latter case, with the lead time ≥10 h.

Estimates of the prognostic properties of simulation results by the SΔ criterion show that the best forecast can be obtained at the lead times of 15 and 18 h, which is close to the travel time in the reach between the chosen gages AGK-95 and AGK-50 in the Pshish river and which is in agreement with the results of correlation analysis.

Simulation of individual flood events in the Pshish river was analyzed. Below, we discuss the simulation of the highest (historical) flood of October 24–26, 2018, with examples of lead time of 5 and 15 h (Fig. 4).

Fig. 4.
figure 4

Actual and simulated plots of water level variations at the passage of flood of October 25–26, 2018, in the Pshish river. The lead time is 5 h in the left part and 15 h in the right part.

For the lead time of 5 and 15 h in the case of the flood of October 25, 2018, MLP model appreciably overestimates the maximal level―by 2.3–3.2 m, respectively. MLP model predicts earlier rise of the level to the marks of an adverse event (AE) (71.802 m) and a hazardous event (HE) (72.802 m) 5 h earlier than their actual time at the lead time of 5 and 4 h earlier than that at the lead time of 15 h. At the flood decay, MLP model demonstrates an abrupt drop in the level, which may be due to the interpretation by the model of a two-peak flood at the upper gage AHC-95, the levels of which are components of predictors for level forecasts at the lower gage AHC-50.

Some perturbation at the flood rise can be seen in the results of M5P model; however, this perturbation decays quickly. At the lead time of 5 h, M5P model overestimates peak floods by 1.1 m; at an lead time of 15 h, the model predicts level reaching AE mark 2 h earlier, but the forecasted time of HE mark is later and the level is 1.0–1.3 m lower than the observed values.

The results of simulation of the same flood by XGBoost model are less convincing, because the model predicts later times of AE and HE marks and underestimates the water levels by 1.0–1.7 m. In that case, at the lead time of 5 h, XGBoost model always underestimates peak water levels, and at the lead time of 15 h, its behavior is similar to that of M5P model, and its forecast of the time of HE level is 4 h later. Thus, the actual lead time in the forecast of HE mark time is 19 and 12–11 h by the models MLP and M5P, respectively.

Overall, the best results of simulation for the historical flood of October 24–25, 2018, on the Pshish river are obtained by two models: M5P and MLP. It is also worth mentioning that XGBoost shows smoother flood rises and falls, unlike MLP with its more abrupt response to changes in predictors. Models M5P and MLP at the forecast can overestimate the values of flood levels in the domain of the values not observed before. Notwithstanding the poorer result of MLP model, estimates NSE and S/σΔ show good prognostic qualities of the model in the case of an outstanding flood.

Similar tests of the models were made for the Mzymta river. The obtained results are given in Table 3. Note that the efficiency of the models estimated by NSE, decays with each step at the lead time from 0.98 to 0.6 much faster than it does in the case of the Pshish river. This is due to the much shorter travel time between the upstream AHC-89 and downstream AHC-160, which is <4 h. In addition, at high floods, the peak forms earlier at the downstream gage because of the direction of motion of air masses from the coast toward mountains and the successive inclusion of parts of the catchment in the runoff formation and transformation processes, which reduces the role of levels at the upstream gage as predictors. The root-mean-square errors of simulation vary from 0.02 to 0.40 m over the training period and from 0.03 to 0.19 m over the test period.

Table 3.   Estimates of simulation quality of water levels in the Mzymta R. at Kazachii Brod Vil.

All three models show an appreciable decrease in NSE efficiency over the test period, though most stable results in this case were obtained with the use of multilayer perceptron. The considerable decrease of the efficiency M5P and XGBoost models over the verification period can be attributed to overfitting.

In the case of the Mzymta river, the prognostic efficiency by criterion SΔ is not met. Better simulation results were obtained for the lead times of 3 and 5 h. This demonstrates the need to incorporate additional hydrometeorological predictors for the development of more efficient prognostic procedures and for increasing the lead time. Similarly, the simulation of individual flood events was analyzed for the Mzymta river. The highest flood of October 25–26, 2018, included in the test period, is described by all models with underestimation of maximal water levels and a time lag relative to the observed values (Fig. 5).

Fig. 5.
figure 5

Actual and simulated plots of water level variations at the passage of flood of October 25, 2018, in the Mzymta river. The lead time is 1 h in the left part and 5 h in the right part.

DISCUSSION

The quality of water level simulations over the chosen period for the Pshish river is higher than that for the reach of the Mzymta river. The observed differences between simulation quality in the reaches of the Pshish and Mzymta can be attributed to differences between the natural conditions in their catchments. The area of the lateral part of the catchment is larger than that of the catchments for upstream gages (by factors of 1.3 and 1.9, respectively); however, the inflow has no considerable effect on the propagation of the flood wave in the Pshish river alone. The lateral catchment in the Pshish river reach from Navaginskoe Vil. to Guriiskaya st. has fairly smooth slope with elevations ≤700 m, and the processes in it appear to be synchronous with those in the upper catchment, where the runoff of high floods is mainly formed due to high rainfall. Downstream of Khadyzhensk town, additional inflow can be very low because of the small rainfall-runoff ratio in the lowland part of the catchment, and the character of the Pshish river becomes transit [8]. The large travel time in this reach is due to the small channel slope (with a drop from 180 to 80 m abs.).

Within the boundaries of the higher catchment of the Mzymta R., the processes of runoff formation and flood wave transformation in the examined reach are more complicated because of altitudinal zonation (which affect the proportions between the snowmelt and rain runoff) and a system of differently directed ridges, which affect moist air masses’ movement. The catchment of the Mzymta from its confluence with the Pslukh river to Kazachii Brod includes middle-mountain Krasnopolyanskaya depression (with elevations of 500–2000 m) and a low-mountain zone of the so-called pre-ascent of air masses (up to ~1000 m), open toward the Black Sea coast [10]. Here, the lateral inflow is a powerful factor that affects the variations of water level and hampers the development of level regime models taking into account only the upper gages (ANS-89) located in the upper reaches of the river. In addition, dredging and bank protection operations are carried out in the river channel even after the Winter Olympic Games 2014; these operations can have an effect on the level regime near the upstream gage. In the area near the downstream gage at Kazachii Brod, the level regime in some periods (e.g., October 17–28, 2019) shows the effects of releases from the Krasnopolyanskaya HPP with a range of variations up to 25 cm. The small travel time in this reach is due, in particular, to the large channel slope (with a drop from 650 to 70 m abs. in a reach half as long as that in the Pshish river).

In the Pshish river case study, which shows considerable correlation between water levels at the upper and lower AHC, the best results were obtained using decision trees boosting model, i.e., in essence, an ensemble of linear models, which most effectively utilizes the correlation within the sample. The complex nonlinear MLP model showed the worst results among the three models at each lead time, clearly, because of overfitting. However, its results can have a high prognostic potential for high floods, as it has been shown in the case of an extreme flood in October 2018. The effectiveness of the forecast is highest at the lead time of 15 and 18 h. Therefore, the results of simulation for the Pshish river, which simulate the patterns of flood wave transformation in a river reach, complement the results obtained by the method of corresponding levels (water discharges) for a larger river in the region, i.e., the Kuban [15], though with a one-hour instead of one-day step and with the use of state of the art technologies.

In the Mzymta river case study the relationship between the predictor and predictant is significantly weaker because of the substantial and asynchronous lateral inflow (Fig. 2). The nonlinear model MLP was found to be best among the three models, which can be an indirect indication to its ability to detect additional relationships within the sample. And, although the simulation quality with an lead time up to 5 h is evaluated by NSE criterion as excellent, the stricter criterion of forecast efficiency SΔ does not consider the results as satisfactory. Details of behavior of this criterion with an increase in lead time are discussed in [11]. More effective forecast models for water levels and/or discharges in the Mzymta river can be developed with the incorporation of additional data on the behavior of tributaries and meteorological data.

CONCLUSIONS

The study showed that machine learning methods applied to simulate water levels can give reliable forecasts. For the lower reaches of the Pshish river, the optimal lead time is 15–18 h. In the case of the Mzymta river at Kazachii Brod, with data on water levels at the upper gage at the confluence with the Pslukh river taken into account, the quality of simulation is estimated as good, though the required forecast efficiency has not been achieved. This is the effect of the considerable lateral inflow in the middle and lower reaches of the river and the asynchronous runoff formation in different parts of the catchment.

For the reach of the Pshish river, the best results of water level simulation were obtained with the use of the model of decision tree boosting (XGBoost), e.g., an ensemble of linear models, which most effectively use the correlation within the sample. Models M5P and MLP can successfully supplement and improve the forecasts in the case of high floods.

In the case of the Mzymta river, on the contrary, the best among the three models was the nonlinear MLP model, a fact, which may imply its ability to identify nonlinear relations within the dataset. Further studies will be focused on the search for interpretable neuron networks. This is currently a compelling research topic [26, 44]. To improve the simulation quality of water level in the Mzymta River, it is reasonable to incorporate additional source data on water levels in tributaries and meteorological data.

Overall, the use of machine learning methods to construct procedures for forecasting floods in Krasnodar Krai rivers based on data of AMSFS KK network looks promising. The accumulation of long enough sets of precipitation data with an hourly step will enable studying the potential of artificial intelligence models in problems of rainfall-runoff forecasting and nowcasting.