A Tarso model for studying the level of the Cuiabá river

The Cuiabá River registers cyclical occurrences of floods and droughts over the years, according to the hydrological periods, thus characterizing a non-linear behavior. The prediction of the level of the Cuiabá river is important to help institutions such as the Civil Defense of the state of Mato Grosso and many other institutions that are concerned with the prevention and mitigation of natural disasters. Thus, this study considered the nonlinear Threshold Autoregressive Self-Excking Open-loop (TARSO) model with 2 regimes, with a Bayesian approach. We tested models to which values of the linimetric quota (river water level in millimeters) with and without rainfall (mm) were associated. All models were compared using the lowest DIC, MAPE and MSE criterion, and the TARSO (2; 1, 0, 3, 1, 1) model performed best according to these criteria. Finally, the selected model was shown to produce reliable predictions.


Introduction
Natural disasters have been studying more frequently in the international science community due to the conflicting relationship between society and nature worldwide, even as the consequences of catastrophic events for the population of affected regions. Brazil still lacks papers that purport to present preventive decision-making tools to reduce the outcomes caused by natural phenomena.
These catastrophes are frequently triggered by heavy rainfalls. When precipitation is intense but cannot soak into the soil fast, a large part of its water volume flows into the drainage system, surpassing its natural drainage capacity. The surplus water volume not drained by the soil fills the floodplain, submerging according to the topography of the areas close to the rivers (Tucci, 2004). Generally, these phenomena are enhanced by human changes in the environment, such for example, the waterproofing of the soil and the rectification of watercourses due to urban interventions (Goerl & Kobiyama, 2005). River, Brazil using the SETAR model, with a threshold set by the median.
This work aims to analyze the behavior of the time series of the daily quota of the Cuiaba River and propose forecast models through the TARSO model.

Study area
The Cuiaba watershed is located in the western part of central Brazil, in the state of Mato Grosso, with a total area of approximately 29,000 km 2 with 841 km in perimeter belonging to the Upper Paraguay River basin. The basin is located between the geographic coordinates 1418 and 1700S and 5440 and 5655W. The headwaters belong to the municipality of Rosário Oeste, on the riverside of Serra Azul, and their main sources are the Cuiaba da Larga and Cuiaba do Bonito rivers. After the confluence of these rivers, it changes its name to Cuiabazinho river and, only after the encounter with the Manso, it turns to the name of Cuiaba River (de Almeida et al., 2020; MATO GROSSO, 2003).
The Cuiaba river basin consists of two large geological formations with well-defined biotic and abiotic characteristics: the Pantanal flatland and the surrounding highland and mountain ranges. These orographic characteristics enable us to distinguish three different regions in the Cuiaba river basin, namely: Upper Cuiaba, Middle Cuiaba, and Lower Cuiaba (de Almeida et al., 2020)

Pluviometric anad Fluviometric data
The pluviometric and fluviometric daily data were obtained from January 1st, 2001, up to December 31st, 2012, resulting in a total number of 4384 observations for each data set. The fluviometric data consists of measurements from the limnimetric level (river water level in millimeters), and the pluviometric data consists of rainfall measurements (mm). The limnimetric quotas refer to the Rosário Oeste stations (code 66250001) obtained from the Hidroweb system of the National Water Agency (ANA). Furthermore, the rainfall data were collected through the TRMM satellite (The Tropical Rainfall Measuring Mission).

TARSO MODEL
In this work, it is considered the TARSO model with two regimes and threshold value r (Tsay, 1998). Consider the model where • Y t is the linimetric level of the Cuiabá river at time t; • X t is the precipitation at time t • r is the threshold parameter which breaks the series into two parts; • d > 0 is the lag parameter; • p 1 , p 2 , q 1 , q 2 are orders of submodels in each regime; • ϕ and θ are the autoregressive parameters; • ε t is the white noise, uncorrelated with zero mean and constant variance.
The equation 1 assigns that Y t belongs to two different processes with different orders (k, p 1 , p 2 , m 1 , m 2 , q 1 , q where k is the number of regimes that depends on the threshold variable X td . It is denoted by Assuming that the orders (p i , q i ), i = 1, 2 are known, the model parameters are (γ i , τ i ) where ..X t-p i ) with i = 1, 2 the model 1 can be matrix rewritten as Let D = Y t , X t , t = p + 1, ..., n the entire sequence, p = max(p 1 , p 2 , q 1 , q 2 ) and d a fixed value, in which the model is conditioned and n 1 , n 2 the number of observations in each regime. Conditioning in the p-first observations then ε (1) t ∼ N(0, τ i ), the conditional likelihood function can be approximated by

Statistical Analysis
All analyzes were performed in the R Core Team (2020) software. Initially, descriptive analyzes of the linimetric quota series and rainfall were performed using the graphs of the time series, as well as the application of the Cox-Stuart and g Fisher tests (Morettin & Toloi, 2006) to verify, respectively, the existence of trend and seasonality.

Bayesian Analysis
The Bayesian analysis was performed considering the article by Sáfaddi & Morettin (2001) assuming a previous distribution by Jeffreys: The parameters r and d are known, their values were specified as r = 119 (median of the quota), and d = 1 in the same way as used by de Almeida et al. (2020).
Thus, the posterior distribution is gamma-normal and the complete conditional distributions of γ i and τ i are given by The posterior distribution estimates for γ i and τ i were obtained using the MCMC method and the Gibbs sampling algorithm Geman & Geman, 1984.
The models were fit to datasets considering the different combinations of • with and without the parameters ϕ 10 , β 10 , ϕ 20 , β 20 Figure 1 exhibits the time series of the daily average linimetric quotas at the Rosário Oeste Station, in which a seasonal behavior is observed. The Fisher g test shows the existence of seasonality (p-value <0.0001) with a period of oscillation of the series around 366 days, which was determined by the periodogram (Morettin & Toloi, 2006), with the maximum in the rainy periods and minimal in the dry period. The Cox-Stuart test had no trend (p-value=0.8639), indicating that there was no gradual increase or decrease in the series over time.   The cross-correlation function between the mean elevation and precipitation is shown in figure  3. In positive and negative lags they are significant up to around 30 days, indicating that precipitation positively influences the average quota and the opposite also occurs, but it is also observed that positive lags have a greater correlation, indicating a greater force of precipitation in influencing the quota.

Models
Among the 14,400 fitted models, only 38 had all significant parameters, and their configuration of these models is presented in table 1. Of these 38 models, we have to: • In the first regime for the quota autoregressive order 31 models with p 1 = 1 and 7 models with p 1 = 2, all models presented m 1 = 0, indicating that when the quota is lower than its median, its value depends on 1 or two previous ones and has no influence of precipitation. • In the second regime for the autoregressive order of the quota, 22 models with p 2 = 1, 1 model with p 2 = 2 and 15 models p 2 = 3, for rainfall variable 19 models presented m 2 = 0 and m 2 = 1, indicating that when the quota is higher than its median, its value depends on the previous 1 to 3 and is influenced by the previous day's rainfall.
From Tabel 1 it is also possible to observe that the models 26 and 29 presented better performance since they have the lowest DIC value (9503,4).   and from the 95% credible intervals, all parameters are found to be meaningful. From the estimates, it is possible to observe that: • In the first regime, the quota value revolves around a constant 2.97 and is positively affected by the value of the previous day, it does not influence precipitation; • In the second regime, the quota value is influenced by up to three past values, with the first and third day having a positive influence, and the second day negatively. The quota is also positively influenced by the past day's rainfall.  Figure 5 shows the densities and final chains of the parameters for model 29 -TARSO(2,1,0,3,1,1) with intercept ϕ 10 and ϕ 20 .
The table 3 shows the estimates for model 29 -TARSO(2,1,0,3,1,1) with intercept ϕ 10 and ϕ 20 -where all parameters are found to be significant. From the estimates, it is possible to observe that: • In the first regime, the quota value revolves around a constant 3.00 and is positively affected by the value of the previous day and it does not influence precipitation; • In the second regime, the quota value revolves around a constant 13.24 and it is influenced by its values of up to 3 previous ones, with the first and third day having a positive influence, and the second day negatively. The quota is also positively influenced by the previous day's rainfall.  Figure 6 shows the daily average linimetric quotas observed and estimated by the TARSO model(2,1,0,3,1,1) with ϕ 10 intercept for the Rosário Oeste station. It is noted observed that the values presented by the model were similar to the observed values.
The daily average linimetric quotas observed and estimated by the TARSO model(2,1,0,3,1,1) with ϕ 10 and ϕ 20 intercept for Rosário Oeste station are presented in Figure 7, in which similarity is observed between the results obtained by the model and the observed values.
The table 4 presents the MAPE and MSE of the TARSO models (2,1,0,3,1,1), in which there is a good predictive capacity of the two models, but the model with intercept shows a better performance considering the first regime ϕ 10 .

Conclusions
Forecasting the stage of the Cuiabá river basin is crucial to help the Civil Defense of Mato Grosso and many other authorities concerned with the anticipation and mitigation of natural disasters. Due to the history of flooding in the Cuiabá river basin and its complex dynamics, linear time series models do not present good fits. Consequently, they are inappropriate for fitting seasonal time series. Accordingly, this paper considered the Threshold Autoregressive Self-exciting Open-loop (TARSO) non-linear with two regimes to study the daily quota of the Cuiabá river.
The Bayesian framework was implemented through the MCMC methods and Gibbs sampling algorithm, which is available in R software, thus the parameter estimates of the TARSO model were obtained. Among the total of 14,400 models considered in this analysis, only 38 of them present significance in all the parameters. According to selection model criteria, the TARSO model (2,1,0,3,1,1) presented the best performance, having the intercept coefficient in the first regime. This model holds a good predictive capacity for estimating the Cuiabá river basin level, being suitable for a system of flood alarm.
de Almeida et al. (2020) used the SETAR model to forecast the daily quota of the Cuiabá river. The authors supposed that the river flow dynamics depend only on its past values. The TARSO model, however, may take into account external variables. In this work, incorporating the daily precipitation in the TARSO model improved its predictive capacity, mainly in estimating high quota values.
The TARSO model presented satisfactory results in capturing the non-linear dynamics of the daily quota in the Cuiabá river. This threshold model is suitable in capturing seasonality and nonstationarity characteristics of the Cuiabá river. Furthermore, it captures the relationship between river height and precipitation. It is worth mentioning that the rainfall-runoff process presents asymmetric effects on river flow and it is adequately captured by the TARSO model with two regimes.
In TARSO model, d parameter expresses the threshold, and r parameter is the one that splits the series into two parts. In this work, these quantities are fixed and known. Future work may consider these parameters as unknown, affording more flexibility to the inference methodology.