Application of artificial neural network (ANN) for water quality index (WQI) prediction for the river Warta, Poland

The aim of this paper is to present the potential of using neural network modelling for the prediction of the surface water quality index (WQI). An artificial neural network modelling has been performed using the physicochemical parameters (TDS, chloride, TH, nitrate, and manganese) as an input layer to the model, and the WQI as an output layer. The physicochemical parameters have been taken from five measuring stations of the river Warta in the years 2014-2018 via the Chief Inspectorate of Environmental Protection (GIOŚ). The best results of modelling were obtained for networks with 5 neurons in the hidden layer. A high correlation coefficient (general and within subsets) 0.9792, low level of MSE in each subset (training, test, validation), as well as RMSE at a level of 0.624507639 serve as a confirmation. Additionally, the maximum percentage of an error for WQI value did not exceed 4%, which confirms a high level of conformity of real data in comparison to those obtained during prediction. The aforementioned results clearly present that the ANN models are effective for the prediction of the value of the Surface water quality index and may be regarded as adequate for application in simulation by units monitoring condition of the environment.


Introduction
Modelling and optimisation have become crucial for contemporary environmental management. Rising concerns for balanced development triggered various institutions supervising the quality of the environment to implementation of innovative solutions for the reduction of working costs and energy. Statistical and numerical methods are more commonly applied for environmental research i. a. for the comparison of empirical data [1,2] modelling of cavitation erosion process [3,4,5] modelling processes in sewage treatment plants [6], monitoring chemical processes inside reactors [7], or for models of pollution forecasting, such as, quality indoor air [8], the number of produced waste [9].
Especially, models are widely used for solving problems related to water quality management [10]. Decisions concerning water resource management increasingly becoming dependent on model researches [11], whereas tools for modelling are more and more advanced [12].
A mathematical model often used for the assessment of water quality is the water quality index (WQI). Implementation of this kind of model facilitates the standardisation of the surface water quality assessment system. This index is successfully used for water quality assessment in the US and the UK [10]. IOP Publishing doi: 10.1088/1742-6596/2130/1/012028 2 Moreover, simulation modelling is also gaining acknowledgment [9 -13]. The method of water quality simulation presents numerous advantages, naming: low or non-costs of simulation, required short time, decreased demand on measuring or laboratory equipment and staff, the possibility of producing a great amount of synthetic data for analysis, regeneration of lacks in data, measuring and controlling-calibrating devices [14].
Models are able to simulate quality parameters crucial for the user based on suitably chosen, measured input parameters. That may be essential for the assessment and prediction of more complex parameters which are more difficult to measure. The parameters that are easy to measure in water are temperature, pH, and electrical conductivity (EC), while more difficult are, for example, nitrates or chemical oxygen demand. Due to this fact, when creating a simulation model the parameters easy to measure may be used as input data, and difficult to determine parameters shall be simulated [10].
In the hereby presented research, the application of an artificial neural network to predict the water quality index (WQI) for the river Warta waters (Poland) has been proposed.
Artificial Neural Networks (ANN) are computer systems composed of a collection of computational nodes, called neurons, and their connections. The way of operation of such a model mimics the operation of the human brain. The neural network has to be adjusted to solve a given problem via a learning process using typical stimulation and response with the desired reaction; thus, this differs from the traditional modelling method, where it is necessary to define an algorithm and create a program [15].
The neural network has become a base for several pieces of research as neural network assisting in water quality modelling [16], application of ANN for water quality management [17], forecasting dissolved oxygen concentration in the Klamatach river [18] salinity, nitrate concentration in groundwater [19], concentration ammonia nitrate, COD and mineral oil using neural network [20]; river flow modelling using artificial neural network [21].
The aim of this research is to assess the quality of water in the river Warta through the calculation of WQI, and WQI prediction using ANN.

Study area
The river Warta is Poland's third-longest and its total length is 808.2 kilometres. The catchment area of the Warta covers 54.519 square kilometres ( Figure 1). The Warta basin is located within two hydrogeological regions. The body of those areas is within the Kraków-Częstochowa Upland, and only the minor part on the East is within the area of the Nida region (XI). The use of the Warta catchment area within the voivodeship is dominated by land developed for agricultural purposes, which account for 60.7% (1910.0 km 2 ) of the drained area. Two times smaller area (968.5 km 2 ) is covered by forests and wooded land. Their participation in the described area is 30.8%. Urbanised areas in total cover an area of 254.1 km 2 , which accounts for only 8.0% of the catchment area [22].

WQI Calculation
The Water Quality Index model [25] used in this study is based on the weighted arithmetic mean method. The limits of the parameter (Si) are selected as limits in class 2 of the rating scale. The weighting factor of each parameter is calculated and shown in Table 1 [25] on the basis of data given in table 2. Water quality for drinking purposes is usually classified into five categories (Table 3). WQI was calculated using three steps: 1) In the first step, the unit weight was calculated for each chemical parameter using the below equation: where Wi is the relative weight, wi is the weight of each parameter, n is the number of parameters. The summary of the assigned weight (wi) and relative weight (Wi) of each physicochemical parameter is illustrated in Table 2.
2) In the second step, qi was computed using the below equation: where qi is the quality rating, Ci is the concentration of each chemical parameter in each water sample in milligrams per litre. Si is the WHO drinking water standard for each chemical parameter in milligrams per litre.
3) In the third step, WQI is calculated by using the below equation:

Sampling
The results of the physicochemical parameters of the Warta gathered in Table 2

Artificial Neural Networks
In order to determine input parameters to ANN modelling, the correlation analysis using the Statistica13 programme was applied. The WQI prediction was conducted using the Neural Network library in the MatLab and Simulink computing environments. Input parameters to the model were the physicochemical parameters (input neurons), whereas the output neuron was the WQI. The Neural Network Fitting app has been used for the process of modelling. Networks were created with one hidden layer. The selection of the network was made on the basis of the Mean Square Error (MSE) and the regression (R) value by changing the number of neurons in the hidden layer in the range from 2 to 10, as well as the learning algorithm (Levenberg-Marquardt, Bayesian Regularisation, and Scaled Conjugate Gradient). The higher the regression value and the lower the MSE were, the better quality of the generated network. The data set has been divided into training (70%), testing (15%), and validation (15%) subsets.

The Water Quality Index
The calculated WQI index is presented in Table 4. All chosen measuring spots of the Warta are characterised by the WQI<50 which indicates the excellent type of water. None of the described measuring stations indicated poor quality water. Even though the descriptive report by the Regional Inspectorate of Environmental Protection in Katowice proves poor water quality for the given catchment areas taken as unified parts of the surface waters. It has been emphasised that the particular groups of assessed parameters in the given catchment areas are changing in a quite wide spectre. Since there are catchment areas where assessed parameters are of the highest quality -I class, but there are also those of the lowest quality classified as IV. Such quality status of the Warta's waters is a result of diverse anthropogenic impacts on the environment [24]. In this case, agricultural activity has the greatest importance carrying a major issue of fertilization of fields and using pesticides. Those substances together with rainwater permeate then the surface waters leading to their pollution. And for the Warta itself, a crucial fact is that the river flows through urban areas of Zawiercie, Myszków, and Częstochowa. What is more, a vital role for the transformation of physicochemical parameters of flowing waters plays a reservoir in Poraj. A complex catchment situation triggers significant differences in the quality of surface waters between individual watercourses [27].
This kind of diversity of parameters of the quality of surface waters may be challenging for research stations. The application of simulation models may facilitate the operation of such measuring entities.

Artificial Neural Networks
In the first step, the correlation matrix between variables has been examined ( Table 5). On the basis of the correlation analysis (the correlation coefficient above 0.5), the selection of input parameters to the ANN modelling has been made. The following parameters have been chosen total dissolved solids (TDS), chloride, total hardness (TH), nitrate, and manganese. Schematic representation of the ANN in Figure 2.  Figure 2. Schematic representation of an artificial neural network.
The best network has been reached within 19 iterations and it is the network with five neurons in the hidden layer ( Figure 3). Validation performance through MSE is presented in Figure 4 the best validation has been reached within 7 iterations, and the value for it equals 0.77235. The pace of the error decrease (gradient) for a particular iteration of the validation set depending on the number of consecutive increases of the MSE for this set and momentum (Mu) are presented in Figure 5. With six consecutive increases of the MSE validation error, the learning process of the network is being stopped. Table 6 presents the results of the network's learning process (MSE and regression -Rvalue) including the division into training, testing, and validation subsets.    Regression statistics for particular subsets (training R=0.9988, validation R=0.9554, testing R=0.9678) are presented in Figure 6. General regression reached 0.9792, therefore for each case R>0.95 which proves an incredibly good matching of the network and a high level of correspondence of measuring points.
Post-modelling, the Simulink diagram has been generated (Figure 7), which facilitates the prediction of the WQI after providing data concerning TDS, chloride, TH, nitrate, and manganese.
As a result of the conducted ANN modelling, the prediction of the WQI has been done, which depends on the physicochemical parameters such as TDS, chloride, TH, nitrate, and manganese. Figure 8 presents a comparison between predicted and real data resulting from the ANN prediction.     Taking into consideration slight discrepancies between real data and those obtained as a result of prediction (RMSE= 0.624507639) it has been stated that described ANN models present an acceptable level of error thereby may be applied as decisive predictors in the determination of the WQI [28,29]. Accordingly, to the latest literature research, there is an increasing number of research pieces treating of application of neural network models in the process of prediction of the quality of both surface and groundwaters using different input variables (water physicochemical parameters). Two indexes -R and MSE, prove an optimal choice of the neural network. In this paper, the value of R equalled 0.9792.
Vasanthi and Kumar have forecasted the index of water quality for the river Palayar using the following quality indexes: DO, TDS, SAR, BOD5, HCO3 as input data. The parameters characterising the network indicate the possibility of using the ANN to predict the water quality index. It is a particularly useful tool for predicting the water quality of rivers.

Conclusions
Neural network models can be an effective tool to predict the surface water quality index (WQI). The modelled neural network determines the relationships between the input dataphysicochemical parameters of surface waters (TDS, chloride, TH, nitrate, and manganese), and the output data: the WQI. The best modelling results have been obtained for a network with 5 neurons in the hidden layer. A high correlation coefficient (general as well as in individual subsets) of 0.9792 and a low level of MSE in each of the subsets (training, test, and validation) as well as RMSE of 0.6245 have been reached. Additionally, the maximum error percentage for the WQI value did not exceed 4%, therefore they may be recognized as sound predictors concerning the tested data.
The flexibility of the neural network structure allows water quality to be predicted using a smaller number of physicochemical parameters than would be necessary in the case of analytical determination of this index. The number of necessary physicochemical parameters has been reduced from 11 to 5. The use of this type of tool is desirable due to the time-consuming and costly constraints of acquiring real data.