HIGH PERFORMANCE APPROACH FOR WATER LEVEL FORECASTING IN YOM RIVER BASIN OF THAILAND

Analyses of average monthly water level (AMWL) time series (April 2007 March 2020) at four water level measurement stations (Y.31, Y.20, Y.1C, Y.37) for wet and dry seasons in the Yom River basin of Thailand. Using Box-Jenkins method, eight best-fit seasonal ARIMA models for one hydrological year forecasting for wet and dry seasons of AMWL were selected; among from twenty-four possible models, by minimum values of AIC, SBIC RMSE, and MAPE. Besides, The comparisons with two benchmark models, Holt-Winters’ Seasonal additive method, and Seasonal Naïve method were applied. Results indicated that: The four selected SARIMA models for wet seasons of Y.31, Y20, Y1C, and Y.37 are SARIMA(1,1,1)(1,0,0)[6], SARIMA(1,1,1)(1,0,0)[6], SARIMA(1,1,2)(1,0,0)[6], and SARIMA(1,1,1)(1,0,0)[6], respectively. While the models for dry seasons are SARIMA(0,1,1)(2,0,0)[6], SARIMA(1,1,1)(1,0,1)[6], SARIMA(1,1,1)(2,0,0)[6], and SARIMA(1,0,0)(1,0,1)[6]. The forecasting performance 3103 WATER LEVEL FORECASTING IN YOM RIVER BASIN OF THAILAND are the minimum values of RMSE and MAPE between SARIMA and the benchmark models. The SARIMA model is the best approach for Y.31 Station [Wet Season], Y.31 Station [Dry Season], and Y.37 Station [Dry Season], while the best method for Y.20 Station [Wet Season], Y.1C Station [Wet Season], Y.37 Station [Wet Season], Y.20 Station [Dry Season], and Y.1C Station [Dry Season] is Holt-Winters’ seasonal additive method. The upstream station (Y.31 station) has higher accuracy than the downstream station (Y.37 station) due to human activities that disturb hydrological changes. Furthermore, the dry season forecasting is more accurate than the wet season.


3104
T. PANITYAKUL1, P. KHWANMUANG, R. CHINRAM, P. JOMSRI the additive Holt-Winters method. Sopipan, 2014 [10] study of forecasting historical monthly rainfall data from April 2005 to March 2013 in Nakhon Ratchasima Province, Thailand. Using auto regressive integrated moving average (ARIMA) and multiplicative Holt-Winters method, which the mean absolute percentage error (MAPE), mean squared error (MSE) and mean absolute error (MAE) were used to measure the performance. Forecasts from both methods were found to be acceptable but ARIMA gave a better result for that case. Leo et al., 2016 [11] evaluate the potential of Random Forest, to make streamflow forecast at a 1-day lead time in the Pacific Northwest. Which benchmark the performance against simple Naïve and multiple linear regression models using the calculated Pearson correlation coefficient (r) between forecasted and observed values for each model.
This study propose the appropriate forecasting models for average monthly water level (AMWL) time series of Yom River basin in the Northern of Thailand. Using the Box-Jenkins method, best-fit seasonal autoregressive integrated moving average (SARIMA) models for one hydrological year forecasting for wet and dry seasons of AMWL compare forecasting model with two benchmark models, Holt-Winters' Seasonal additive method, and Seasonal Naïve method. The study period is from April 2007 to March 2020, over thirteen hydrological years.

STUDY REGION AND DATASET
Yom River basin located in Northern Thailand. It is river basin covers a surface area of approximately 24,046.89 km 2 , between the latitude 14 50' N to 18 25 Phichit province it confluence with Nan River in Nakhon Sawan province at a low slope of 20-50 m(MSL). The length of the Yom River is approximately 735 km [4,12].
In 2014, the Yom River basin received average annual precipitation of about 1,179 mm, and  average annual runoff of about 5,261 million mm 3 and an average annual runoff of fewer than   2,500 mm 3 per year per person, which is less than average annual runoff in Thailand per year per   person (3,496 m 3 ). In 2019, the Yom River basin is one large-sized reservoir and five mediumsized reservoirs with a total storage capacity of 295.62 million m 3 [1,4,12,13] [14,15,16,17,18], as shown in Fig. 1. The AMWL data at the four previously listed water level measurement stations for the wet and dry seasons of Thailand was calculated, the wet season is from May to October and the dry season is from November to April.  , ) where , and are non-negative integers. In this notation, the -parameter refers to the autoregressive (AR) part, the -parameter refers to the order of regular differencing ( ) part, and the -parameter refers to the moving average (MA) part [8,19].
AR( ) is an autoregressive model of -order and is represented by: MA( ) is the moving average model of -order and is represented by: Where is the AMWL at time is the autoregressive parameter of ℎ is the moving average parameter of ℎ is independent random variable that represent the error term at time The combination between autoregressive model and moving average model is called mixed autoregressive moving average or ARMA( , ) model and is represented by: The ARIMA model assumes that the ℎ order differencing process follows the ARMA( , ) model, where the ℎ order differencing process means the -times subtracted process.
Generally, the ℎ order differencing process is written (1 − ) . Therefore, the ARIMA( , , ) model is represented by: Box and Jenkins then generalized the above model and developed the multiplicative seasonal ARIMA model or ARIMA( , , ) × ( , , ) model is represented by: Where is the seasonality is the order of the seasonal differencing Process is the order of the seasonal Autoregressive polynomial is the order of the seasonal Moving Average polynomial is the residuals ARIMA( , , ) model is represented by: Solving the equation (7) for and substituting the equation (6), get the multiplicative seasonal autoregressive integrated moving average model or SARIMA ( , , ) × ( , , ) model is represented by: Where (1 − ) is the ℎ order seasonal differencing process is independent random variable that represent the error term at time Box-Jenkins methodology used in time series analysis has three steps. The first step, model identification, which model should that time-series be used. The second step, model parameter estimation is specified in the first step. The third step, diagnostic testing is specified in the first step by using the estimation results in the second step as an evaluation. After getting the best-fit model, therefore, the model was used to forecast time series [19,20].

Model Identification.
Consider stationary data because Box-Jenkins methodology requires stationary time series data. A stationary time series of variable is one whose statistical properties such as mean, variance, autocorrelation etc., are all constant over time.

Diagnostic Testing.
The criteria for consideration is that the independent random variable ( ) in the selected subject must qualify as white noise, which is zero mean, variance constant over time, and is independent of other periods. Using the error obtained from the residual ( ) in the second step as a test instead of variable, where = −̂, because variable cannot collect data. In practice, the independent random variable ( ) may qualify white noise in more than one model. The following ideas are required in determining the best-fit model: 1) The coefficients of all the selected subjects were statistically significant.
2) If those models with independent random variable ( ) qualify as white noise, a model with the lowest forecast error may be selected based on the Akaike's Information criterion (AIC) method [21] and considered in conjunction with Sawa's Bayesian information criterion (SBIC), by considering the model with the lowest of these values, it would be most suitable.
Where is the number of estimated parameters in the model.
is the number of observations of Average Monthly Water Level data.
is the Sum of squared errors in the model.
̂2 is the error variance.

Seasonal Naïve method.
The naïve method, naïve forecast is optimal when data follow a random walk, these are also called random walk forecasts. Set all forecasts to be the value of the last observation. The seasonal naïve method a similar method that is useful for highly seasonal data. Set each forecast to be equal to the last observed value from the same season of the year [24].
Formally, the forecast for time + ℎ is written as Where ̂+ ℎ| is the AMWL at time + ℎ is the frequency of the seasonality is the integer part of ℎ−1

Accuracy and Performance.
If the root mean square error (RMSE) and mean absolute percentage error (MAPE) values are small, the forecast value of AMWL is highly accurate.
Where is the forecast value of AMWL data of ; = 1, 2, … , is the observations of AMWL data of ; = 1, 2, … , is the number of observations of AMWL data.
The cross-validation method is one of the methods of dividing data to perform performance testing from the variety of prediction models. The basic concept of cross-validation is to split data into multiple parts and use some of the data to predict others. The K-fold cross-validation method is certainly the most popular cross-validation method procedure. In K-fold cross-validation, firstly, the K equal-sized subsamples were partitioned from the original data. Secondly, we treat K−1 subsamples as the training data, i.e. the model fitting dataset, and the remaining subsample as the testing dataset, i.e. be compared with the predicted values from the training model. Finally, we repeat these process K times along with the K subsamples and find the average performance of K- transformation with the first-order differencing process. As show the assumption of stationarity in Table 1 and as show the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots with 99% confidence limits in Fig. 4. Though that have some positive and negative spikes of both ACF and PACF plots without 99% confidence limits, but the test of time series with Ljung-Box test method is non-significant (The p-value is more than the significance level = 0.01).
Therefore, the time series is stationary.
Based on the twenty-four possible SARIMA models, there are eight selected best-fit SARIMA models of Average Monthly Water Level data of all four water level measurement stations for wet and dry seasons. Parameter estimates or coefficients of the SARIMA models shown in Table 2,  Table 3, as shown the ACF plots with 99% confidence limits demonstrated in Fig. 5, and The performance and accuracy of the best-fit SARIMA models stated in Table 4    than the significance level = 0.01). Therefore, the parameter estimates were included in the model, as shown in Table 2.  Fig. 5 The ACF of residuals plots with 99% confidence limits of the best-fit SARIMA models of AMWL data of all four water level measurement stations for wet and dry seasons.
Consider the assumptions of the residuals of AMWL data of all four water level measurement stations for wet and dry seasons: The test of time series was zero mean with one-sample t-test method, The test of time series was constant variance with Bartlett's test method, and the test of time series was no autocorrelation with Ljung-Box test method, as shown in Table 3. The residuals testing of all three methods showed non-significant (The p-value is more than the significance level = 0.01) and most positive and negative spikes of the ACF of the residuals plots within 99% confidence limits, as shown in Fig. 5. Therefore the residuals are as correct according to the assumption as white noise. Include The performance and accuracy of the best-fit SARIMA models as shown in Table 4.  Table 4 The relative performance and accuracy of the best-fit SARIMA models of AMWL data of all four water level measurement stations for wet and dry seasons.   Table 5 The equation of the best-fit SARIMA models of AMWL data of all four water level measurement stations for wet and dry seasons.

Model
Equation The forecasting equation of the best-fit SARIMA models in Table 5. Including coefficients from Table 2 with the historical AMWL time series, which is different from each station. The historical time series was based on the parameter of the SARIMA( , , ) × ( , , ) model in equation (8).  Table 6 and Fig. 6. However, the Box-Jenkins method is a time series analysis by using the stochastic process such that yields the forecast values without overfitting problem.

Comparisons of Forecasting
Forecasting of the three method of AMWL data of all four water level measurement stations for wet season as shown in Fig. 7 and dry season as shown in Fig. 8.

CONCLUSIONS
The respectively. These models were selected from more than 24 models based on criteria that included minimum values of AIC and SBIC. seasonal additive method. However, the Box-Jenkins method is a time series analysis by using the stochastic process such that yields the forecast values without overfitting problem.

5.3
The upstream station (Y.31 station) has higher accuracy than the downstream station (Y.37 station) due to human activities (reservoirs, water use for agriculture and consumption) that disturb hydrological changes. Human activities and climatic changes cause data uncertainty in forecasting.

5.4
The dry season forecasting is more accurate than the wet season all four water level measurement stations due to the seasonal variation of the AMWL time series during the wet season is more complicated than the dry season and the magnitude of data dispersion in wet season is wider than dry season.