Modelling of Dissolved Oxygen in Thi Vai River Water Incorporating Artificial Neural Network and Multivariable Regression

555643


Introduction
Water scarcity in the world has been occurring more and more seriously each year [1].Water resources in some areas are declining in both quantity and quality.It is directly linked with human welfare such as recreational activities (swimming and boating), or municipal, industrial, and private water supplies, agricultural uses including irrigation and livestock watering , the quality of water is considered to be a vital concern for mankind [2].In addition, the assessment and management of water resources has become very complex with population growth [3].Today, decisions on water resources management are increasingly being based on model studies [4].Therefore, the precise determination of concentration of pollutants in water is an essential requirement to support effective management and legislation [5].Numerous computational and statistical approaches have been applied to predict the water quality in reservoirs.The dissolved oxygen (DO) is an important quality index for evaluating surface water quality because it represents for polluted level, the state of aquatic ecosystems of water bodies [6].
However, because of the influence of different factors on different waters, it is difficult to simulate DO concentrations by traditional mathematical methods [7].In addition, limited water quality data and the high cost of water quality monitoring often pose serious problems for process-based modelling approaches.Moreover, the elements of aquatic eco-systems such as chemical, physical, and biological components are very complex and Organic and Medicinal Chemistry International Journal nonlinear.Which model and parameter will be used to model DO remains a question.In this regard, some traditional models, such as regression models and Artificial Neural Network (ANN) models were investigated and compared [8].Applied the model MLR and ANN to forecast the daily DO variation based on water temperature (WT) and runoff in the Bow River, Canada.The results show that the ANN model outperformed MLR model.Suggested DO = -0.18WT+ 0.591pH -35.46 with the coefficient of determination (R2) was 0.4 for DO value in the River Danube [9,10].
Asserted that ANN are a relatively new concept in environmental modeling.ANNs are suitable to model nonlinear processes, such as the dynamics of DO in surface water [11].Many kinds of networks can applied in ANNs such as Multi-Layer Perceptron (MLP) [12].Adaptive Neuro Fuzzy Inference System ( MLP and ANFIS) [13].Recurrent Neural Networks (RNN), Generalized Regression Neural Networks (GRNN), Radial Basis Function Network (MLP and RBFN) [14].However, according to, ANN using Backpropagation Neural Network (BPNN) is the most widely used neural network for forecasting/prediction purposes [15][16][17].BPNN generally consists of three layers including an input layer, a hidden layer, and an output layer [15].Each layer consists of neurons which are connected to the neurons in the previous and flowing layers by connection weights.These weights are adjusted according to the capability of the trained network.Input vectors and corresponding target vectors are used to train BPNNs until the models can approximate a specified minimum error or a maximum number of epochs.BPNNs with weightings, biases, a sigmoid layer, and a linear output layer can approximate any function with a finite number of discontinuities.
In recent years, several researches have been conducted on water quality simulation including DO using ANNs models [18][19][20][21][22][23].This method can predict the water quantity data with high precision and more robust.established an ANN model to predict total Nitrogen, total Phosphorus, total Organic Carbon, DO and Fe in deep waters of Swedish Lakes [24].The efficiency of regression model and ANN application in DO determination was reported in the aforementioned studies conducted for specific areas.However, to date study on predicting DO using regression and ANN models in Vietnam is limited.Thus, in this study, the application of ANN using Backpropagation Neural Network (BPNN) algorithm and multiple linear regression analysis (MLR) to model DO based on temperature, pH, turbidity, conductivity, COD, BOD, NO 3 -and PO 4 3-.Besides, we also compared the coefficients of two models (coefficient of determination -R 2 and root mean square error -RMSE) to determine which model is better for predicting DO in water bodies.

a.
Study Area: Thi Vai river is a tributary of Dong Nai river system in Viet Nam. Thi Vai river basin covers an area of 625 square kilometers, starting from Nhon Tho village, Long Thanh district, Dong Nai province, which flows through Tan Thanh district, Ba Ria-Vung Tau province and Can Gio district in Ho Chi Minh city.The river basin has a length of 32 km and a width of 400 to 600 m.This river basin has a depth of 12 -20 m (the deepest area is about 60 m) [25].Thi Vai river is a tidal estuary.The tidal range of this river is higher than 400 cm with fast flow.The salinity of Thi Vai river varies from 24% to 32% in the rainy season, so Thi Vai river has the saline characteristics of salt.
The water quality of Thi Vai river is represented by DO parameter.This parameter pointed out the `decreasing tendency.This can be also explained that Thi Vai river receives not only 34,000 m 3 of the untreated sewage from about 200 factories along the river basin, but also it receives the large amounts of untreated sewage from residential and cattle-farming areas.A few locations with high pollution include the areas near VEDAN company and Go Dau port or Phu My port [26].

c. Multiple Linear Regression Model (MLR):
MLR model is used to represent the relationship between dissolved oxygen DO and physico-chemistry parameters as a linear function of several predictors [9].MLR model was applied as well in this work to prove their impact on dissolved oxygen DO as Where Y dependent variable (dissolved oxygen DO), x k is

Organic and Medicinal Chemistry International Journal
independent variable (k th physico-chemistry parameter), β 0 regression constant, and β k coefficient of k th physico-chemistry parameter, respectively.
The parameter DO is used to indicate of pollution and self-cleaning capability of water bodies.Dissolved oxygen is also necessary to the organism fish, invertebrates, bacteria and plants.These use the dissolved oxygen in respiration.The biological oxygen demand (BODs) and chemical oxygen demand (CODs) consumed some of the dissolved oxygen in water.These processes can cause of decrease in dissolved oxygen DO.Oxygen is involved in the metabolism processes of various nitrogencontaining compounds.The nitrate-and phosphorus products generate the free forms of nitrate and phosphate in the water.The various forms of nitrogen and phosphorus content facilitate the development of algae.This can change the concentration of dissolved oxygen in water.The appearance of dissolved solids in the ionic form of nitrates, phosphates, etc. also changes the conductivity (EC) so the EC also indicates the DO change.In addition, some other factors also affect the DO.The biological and chemical processes can be changed as the pH changes.This affects the oxygen-consumption capability of oxygen-demanding processes.In addition, dissolved oxygen DO is influenced by natural factors such as temperature, pH, salt and several algae.The solubility of oxygen in water decreases as temperature increases.Furthermore the dissolved oxygen also decreases by the exponential function as salt levels increase.
At the same time, turbidity affects the dissolved oxygen DO because of it increases the light absorption, this can increase the water temperature.Therefore the dissolved oxygen presents the important role in pollution assessment of Thi Vai river basin.The most important variables effecting on DO in Thi Vai river basin were determined based on coeffient of contribution taking the form: Where MPxk,% is average of contribution percentage of physico-chemistry parameter kth on DO, βk regression coeffients of parameter in MLR model, xk independent variable k (Table 1).

d.
Back Propagation Neural Network: The backpropagation neural network BPNN is known a multi-layer feed forward network.This BPNN is trained by the training dataset while it tunes the network parameters using an error back propagation mechanism [27].A BPNN is composed of several layers of networks, however, it is most commonly accepted as three-layer architecture BPNN I(k)-HL(m)-O(n), the input layer I(k) consists of k = 8 input neurons as physicochemistry parameters in Table 1, the hidden layer HL(m) has m = 7 neurons, and the output layer has one neuron (with n = 1) such as dissolved oxygen DO [28].BPNN is trained with Levenberg -Marquardt algorithm.The transfer function was used in each neuron on hidden layer is tansig, for output layer the transfer function is purelin.The learning function used is learngdm.Where RI x the relative importance of input neuron x , ∑w xy w yz sum of final weights of the connection from k input neurons to m hidden neurons and the connection from m hidden neurons to n output neuron, y sum of m neurons in hidden layers, output neuron (dissolved oxygen DO).

e.
Performance Criteria: In this study, several statistical error measures were used to assess the performance of the applied models.The root mean square error (RMSE), coefficient of determination (R 2 ) and mean absolute error (MAE) were used to provide an indication of goodness of fit between the observed and predicted values.Expressions of these error parameters are given as follows: Organic and Medicinal Chemistry International Journal Where n number of observation, Y i i th observed values of dissolved oxygen DO, Y average observation value of DO, and Ŷ i predicted values DO for observation i th .In addition, the t-test method of paired two sets was also applied to compare the predicted value (DO pred ) from MLR and BPNN-I(8)-HL(7)-O(1) model (Table 2) with observed value DO obs , respectively.

a.
DO Tendency Through Years: In general, the water quality of Thi Vai river was surveyed in years 2010 to 2016, but all did not meet the living water standards.The average results of dissolved oxygen DO and the statistical evaluation at seven locations on Thi Vai river basin are presented in Table 2.The mean, maximum and minimum values of DO concentration as well as of other parameters have pointed out the variation range of accuracy and reliability for the observed data collecting from the survey locations.Moreover, the statistical values as the variation coefficients depicted the variation limit of water quality in different locations of Thi Vai river basin.This depended on the climate change and hydro-meteorology in the river basin.
The survey locations showed the change of concentration DO at the different locations corresponding to the production and living processes, as given in Table 3.The SW-TV-01 and SW-TV-02 locations are in far from domestic and industry areas.So these areas exhibited the highest concentration of dissolved oxygen DO in years 2015 and 2016.For five remaining locations the concentration of dissolved oxygen is lower, due to the areas are impacted from the waste sources of domestic, farming and factory areas.The lowest concentrations of DO are in the SW-TV-03 and SW-TV-04 areas, which are affected by the living waste water and fish farming as well as 200 factories along Thi Vai river basin, as given in Table 3.

Organic and Medicinal Chemistry International Journal
In general the highest concentration of DO was found at the confluence location of Ba Ky canal and Thi Vai river.The lowest concentrations of DO are in VEDAN and Go Dau area.Because of these areas are influenced by the waste water from the factories and ships.But in these areas the water quality of this river is still in an average range and it can be improved by the closer monitoring from the units of environmental management in recent years.

b.
DO Prediction With The Multilinear Regression Model: For data of water quality of Thi Vai river measured were collected from years 2010 to 2016, as given in Table 2, we have developed a multivariate linear relationship between DO and physico-chemistry parameters using multivariable regression techniques.The quality of multivariable models was evaluated by calculating the values of regression statistics such as R 2 train , Standard Error (SE), and value of multiple regression correlation between the observed DOs and predicted DO pred values.
The best linear relationship between DO and 8 hydrological parameters can be written in the following form: DO = 28.777+ 3.703x 1 + 0.256x 2  0.009x 3 + 0.124x 4 -0.00004x 5 + 0.014x 6 -0.700x 7 + 12.730x 8 N = 37, R 2 train = 0.811, R 2 adj = 0.757, RMSE = 0.320, F sig = 0.000, SE = 0.368 The regression results pointed out the statistical values R 2 train of 0.81 and RMSE of 0.32, it is satisfactory for regressionstatistics standard.The contribution percentage MP xk , % of each parameter x k in regression equation ( 8) was calculated by using the coeffiences β k .The important effects on the dissolved oxygen DO of Thi Vai river basin were determinated by formula (2), as depicted in Figure 3.We have also used the ANOVA statistics to compare the significant level of regression model based on value F sig = 0.000 in confidence level α = 0.05.This is presented clearly by best-fit capability (Figure 5).In other words, the ANN model I(8)-HL(7)-O(1) pointed out the high predictability and reliability.In neural network BPNN I(8)-HL(7)-O(1) the neurons on input layer pointed out the important effects in the training process of this neural network, as showed in Figure 5.The value RI of 18.65% was calculated by formula (3).The parameters can be also sorted in order of influence for parameter DO: phosphate > conductivty > pH > COD > BOD > nitrate > turbility.4.These suggest that the BPNN I(8)-HL(7)-O(1) model produces the less error.The results are also used to imply that the predictability of BPNN I(8)-HL(7)-O(1) was better than MLR model.This finding was consistent with the studies proposed by [6,21,29].To assess the efficiency of each model, the method t-test paired two samples for means was also used to evaluate the difference between the observed DO obs with predicted DO pred values, as given in Table 4.The results of t-test paired two samples showed that the difference between models MLR and BPNN is insignificant at confident level at 95%.In general, the water quality of Thi Vai river is at average level up to the regulated standards.A few locations are polluted by the discharge of effluents from residential, factory and farming area such as location SW-TV-05 and SW-TV-06 in 2015.But the water quality of the locations SW-TV-05 and SW-TV-06 in 2016 was improved emphatically.Due to the environmental management of Thi Vai river by the authorities in 2016 was carried out efficiently.The effluent from the factories along Thi Vai river basin causes the water area herein to be unsafe for water supply.In addition, the fish farming also affects the water quality such as locations SW-TV-03 and SW-TV-02.Furthermore, the water quality of Thi Vai river is also impacted by the tidal change.

Conclusion
The MLR and BPNN I(8)-HL(7)-O(1) model were then constructed successfully to predict the DO parameter in Thi Vai river.From MLR model the important effect of physico-chemistry parameters for Thi Vai river basin were also determined.This can help the environmental managers to produce the law of environmental monitoring.The predicted DO pred values resulting from BPNN I(8)-HL(7)-O(1) turn out to be a good agreement with the observed DO obs values.This BPNN can be used to be superior to the multilinear regression MLR model.The application of the neural network I(8)-HL(7)-O(1) is more appropriate for predicting the dissolved oxygen DO.The GIS techniques is also a very efficient tool to make the interpolated maps by IDW function.
A monitoring program was also established by Centre of environmental technology and Dong Nai department of natural resources and environment.Water samples used in this work were collected by 7 monitoring stations in the period from 2010 to 2016 covering all different areas of Thi Vai river, as shown in Figure 1.Nine physicochemical parameters pH, temperature, disolved oxigen (DO), chemical oxygen demand (COD), biological oxygen demand (BOD5), conductivity (EC), turbidity, nitrate and phosphate were used as input data in the multivariate regression MLR and artificial neural network ANN model.The dataset was seperated into training set (occupied 80% of total data) and test set (20%).

Figure 1 :
Figure 1: Sampling locations in Thi Vai river basin.
The value MSE of 2.5573 × 10 -5 obtained from training process after training 10000 epochs.The neural network BPNN architecture I(8)-HL(7)-O(1) as presented in Figure 2 was constructed successfully for predicting the DO values from 8 hydrological parameters using various sampling locations in years 2010 to 2016.The most important variables effecting on DO in BPNN architecture I(8)-HL(7)-O(1) evaluated are based on the weight coefficients taking the form [29]:

Figure 3 :
Figure 3: The effect of important parameters on DO in MLR model.Furthermore the parameter pH tends to dominate all other remaining considerations for dissolved oxygen DO in Thi Vai river (MP xk ,% > 70%).The parameters can be sorted in order of influence for parameter DO: pH > temperature > conductivty > BOD > phosphate > COD > nitrate > turbility.c.Predictability of BPNN Model: The neural network BPNN with architecture I(8)-HL(7)-O(1) in Figure 2 was constructed by Levenberg-Marquardt converging algorithm with neurons on input layer such as pH, temperature, COD, BOD, EC, turbidity, NO 3 -and PO 4 3-.This neural network I(8)-HL(7)-O(1) was proceeded by using 10000 epochs.The results of prediction in BPNN I(8)-HL(7)-O(1) are presented in Figure 4.The correlation value R 2 train of 0.9624 between the observed DO obs and predicted DO pred values is extremely high, as showed in Figure 5.It means that the approximation 96.24% of variation in DO is explained by variation in 8 physico-chemistry parameters.Thereby the discrepancy between blue (observed DO obs ) and red (predicted DO pred resulting from ANN model) line is insignificant.
Accuracy of Prediction Methods: The models BPNN I(8)-HL(7)-O(1) and MLR (with k = 8) were tested by using the statistical values such as coefficients R 2train of 0.96, R 2 test of 0.9211 for BPNN I(8)-HL(7)-O(1) and R 2 train of 0.811, R 2 test of 0.4423 for MLR model, as exhibited in Figure 6.Moreover values RMSE for BPNN and MLR model are also used to indicate the predictability.In addition the global absolute mean of errors GAMEs for models MLR and BPNN I(8)-HL(7)-O(1) were close to zero, as shown in Table

Figure 6 :
Figure 6: Correlation between observed DO obs and predicted DO pred values: a) MLR and b) BPNN.
8)-HL(7)-O(1) model.The interpolated values resulting from IDW function were used to make the map of water quality in Thi Vai river basin.The maps of water quality of two years 2015 and 2016 were used to compare between the DO values from models, as exhibited in Figure 7.In part of Thi Vai river, in particular, the location of VEDAN area and other areas near Phu My and Thermal Power Plant are in decline, appropriate measures should be taken to minimize pollution.

Figure 7 :
Figure 7: The interpolated maps between DO obs and DO pred values from MLR and BPN by IDW function.

Table 1 :
The abbreviation for 8 physico-chemistry parameters for predicting disolved oxygen DO.

Table 2 :
The statistical summary of the water quality parameters in Thi Vai river.In this work the predicted values DO pred resulting from MLR and BPNN architecture I(8)-HL(7)-O(1) model and observed values DO obs were used to zone the water quality of Thi Vai river using GIS technique.For GIS technique, the Inverse Distance Weighted (IDW) function is applied to interpolate the dissolved oxygen values of the river zones.

Table 3 :
Average concentration of DO, mg/L over years 2010 to 2016 in sampling locations.

Table 4 :
The comparison between observed DO obs and DO pred values from MLR and BPNN model.In this study, the IDW function of GIS technique was used to interpolate the DO values surrounding the observed DO obs and predicted DO pred values from MLR and BPNN I(