Use of Artificial Neural Network method to Predict the Amount of Oxygen in the Tigris River

The neural network algorithm is one of the most important algorithms used to predict results in artificial intelligence. It provides a set of important predictions in this field, given the importance of the Tigris River in the lives of the Iraqi people, and especially the people in the countries that it passes through. We developed an algorithm using neural networks where we sought to measure the proportion of oxygen in the water, the extent of the proportion’s impact on the quality and purity of water and the water’s suitability for human use. In our research we built a system to predict Chemical Oxygen Demand, Biochemical Oxygen Demand and Dissolved Oxygen by inputting a set of parameters as the Input Layer. Weights will be found for each input and, by using the neural network function, the output layer will be generated, where the output layer is Chemical Oxygen Demand, Biochemical Oxygen Demand and Dissolved Oxygen as single output. To test the results and performance of the system, there is an interface in the application which evaluates it by calculating the following statistical parameter: correlation coefficient (r), root mean square error (RMSE) and mean absolute percentage error (MAPE).


Introduction
Artificial intelligence is one of the most important computer sciences that have entered into various areas of life. It has helped a great deal in solving many complex problems, especially when predicting future results then comparing them with actual results in order to reach a future solution to the problem. The neural network algorithm is one of the most important algorithms used to predict the results in artificial intelligence. It provides a set of important predictions in this field, given the importance of the Tigris River in the lives of the Iraqi people, and especially the people in the countries that it passes through. We developed an algorithm using neural networks where we sought to measure the proportion of oxygen in the water, the extent of the proportion's impact on the quality and purity of water and the water's suitability for human use. The oxygen in water can generally be divided into three basic types: Chemical Oxygen Demand (COD), Biochemical Oxygen Demand (BOD) and Dissolved Oxygen (DO). We are not going to discuss extensively the details of oxygen in the water because there are many sources to clarify (and clarify the importance of knowledge). But we'll talk about it briefly before focusing on the algorithm that has been adopted for the prediction of results. IOP Publishing doi:10.1088/1757-899X/1076/1/012033 2 COD is the amount of chemically consumed oxygen (in milligrams / Liter) required to oxidize organic matter into inorganic substances under certain conditions of time and temperature and in the presence of an oxidizing agent. BOD is the amount of oxygen biologically consumed by microorganisms during their biological activity at a constant temperature. During a specific period of time -called the incubation period -the larger the amount of bio-oxygen consumed, the more contaminated the water. The amount of bio-oxygen consumed depends on the following factors: Water and, prone to degradation, the amount of oxygen dissolved in the water; the amount of food ingredient in water; water temperature; water pH -preferably between (6-8) -as well to the possibility of inhibiting the process of decomposition of materials. Note that the greater the amount of BOD the greater the decrease in the amount of COD, as the ability of bacteria to oxidize decreases the amount of organic matter available and therefore the amount of chemical oxygen required to oxidize it [1].

Related Works
There are numerous studies and research projects that have used neural networks in the analysis of river and sea water to obtain accurate guess results. In 2012, Nasr, Moustafa, Seif, and El Kobro designed and implemented the ANN system to predict the performance of the El-Agamy WWTP-Alexandria in terms of Chemical Oxygen Demand (COD), Biochemical Oxygen Demand (BOD) and Total Suspended Solids (TSSs). The ANN can predict the plant performance with a correlation coefficient between the observed and predicted output variables reaching 0.90 [2]. Multilayer Perceptron (MLP), Radial Basis Network and Adaptive Neuro-fuzzy Inference System (ANFIS) are models developed by Emamgholizadeh, Kashi, Marofpoor and Zalaghi, using artificial intelligence techniques, for the Karoon River in Iran, to compute dissolved oxygen (DO), biochemical oxygen demand (BOD) and chemical oxygen demand (COD) [3]. In 2015, researchers from Szent István University, in collaboration with a researcher from Eötvös Loránd University in Hungary, predicted the quality of the Danube water -the second largest river in Europe -using inputs such as PH, runoff, temperature, and electrical conductivity data, using neural networks [4]. The researchers Harun Türkmenler and Murat Pala from Adıyaman University in Turkey developed an application to predict the performance of a wastewater treatment plant, which used a back-propagation learning algorithm. They depended on the Mean Absolute Percentage Error (MAPE), the sum of the Squares Error (SSE), the absolute fraction of variance (R2), the Root-Mean-Square (RMS), the Coefficient of Variation in percent (COV) values, and ANN-models-predicted effluent BOD concentration. The R2 values were found to be 94.13% and 93.18% for the training and test sets of the treatment plant process, respectively. It was found that the ANN model could be employed successfully in estimating the daily BOD in the effluent of wastewater biological treatment plants that was in 2017 [5]. In our research we use a neural network to Predict BOD, COD and DO for The Tigris River -THE important river in Iraq.

Artificial Neural Networks
The human mind was the basis for building societies. Deliberately, in order to benefit from its way of thinking, the human mind's way of working was simulated when building neural networks. The neuron network is similar to the human neuron in its work. An ANN is a collection of connected nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons. An artificial neuron is one that receives a signal then processes it and can signal to neurons connected to it [6]. Neural networks go through two stages: the first stage is the training stage, and the second stage is the testing stage. In the first stage the neurons are trained by giving them inputs and outputs as well as the way of working to reach those outputs by using a set of weights. It then matches between the outputs.
In the testing stage, we give only the inputs and weights. The system gives the outputs based on the knowledge base formed through the training stage. Figure 1 illustrates the training stage and Figure 2 illustrates the testing stage.  Neural networks are composed of three layers: The Input layer, which consists of a set of nodes, each node representing an input to the system. The second layer is the hidden layer, which includes the function of neural networks and the activation function. The outputs of the data processing in the hidden layer, are outputs within the Output layer [7].

The Proposed System
In our research we built a system to predict COD, BOD, and DO by entering a set of parameters, which may be properties of water or materials present in it. These parameters are represented as the first layer of the system (the input layer). A weight will be generated for each input and the values for this weight are between (0 and 1), and by using the neural network function, an output layer will be generated, with the output layer being COD, BOD and DO as a single output. Note that by increasing the neurons in the hidden layer, the results will be more accurate. In a neural network function, the inputs and their weights are produced, and results are summed according to the following function: When: xi is a parameter to compute COD, BOD and DO, wi is a weight which computes as a random number between (0,1) by random function in Visual basic (RandomNumber(X)), and bi is a threshold. To get the final value, we apply the activation function. The purpose of the activation function is to transform the input signal into an output signal and is necessary for neural networks to model complex non-linear patterns that simpler models might miss. There are many types of activation functions-linear, sigmoid, hyperbolic tangent, even step-wise, Figure 5 shows types of activation functions. For our example, let's use the sigmoid function for activation, which between (0 to 1), so the main reason why we use it. Therefore, it is used particularly for models where the system has to anticipate the probability as an output. The sigmoid is the correct choice, since the probability of anything only exists between the range 0 and 1. The sigmoid function looks like this, graphically:

Implementation And Results
As it is known in neural networks, it goes through two stages of work, in the first stage the cells are trained by providing them with inputs and outputs, and then the hidden layer is based on the light of those inputs and outputs, applying the equation of the neural networks to obtain results close to the outputs.

Training Stage
The application is designed and implemented using Visual Basic version 2013, where the parameters are imported by Excel file and this is considered as input to the system -except COD, BOD and DO which are outputs for the system. The neural network algorithm works to apply the neural network equation on the inputs to produce new values for BOD, COD and DO -then the system matches the new values with our target values. This is called the training stage. Figure 7 illustrates the training stage implementation interface. The number of nodes in the input layer is on the number of entries plus one node of the basic term. As for the number of nodes in the hidden layer, this depends on the user's input, as the greater the number of nodes the more accurate the results are, in the training phase we entered 50 as the number of nodes in the hidden layer and the results were good but not close as we want, so we entered the number 100 for the number of nodes Hidden, so we came to very good results. The result nodes are only 3, and they are COD, BOD and DO. Figure 7 shows the number of nodes in the hidden layer interface. The inputs were a set of parameters collected and measured for the waters of the Tigris River, where 500 meters was adopted as the standard distance for measuring these parameters. The data collection process took 6 months. In our current research, many of the inputs that we found do not affect the results either negatively or positively, such as the time of sampling from the river, water flow, and the distance between the sampling point to another point, were neglected, and the inputs in the following table were relied upon for their significant impact on the results.

Testing Stage
In the testing stage, only inputs without outputs are used to produce new outputs. At this stage, we enter the inputs into the system (which are the same inputs in the training phase), after that the system applies the function of neural networks to it, which is within the hidden layer, where the user enters the same number of nodes for the hidden layer in the training phase, and on the basis of training for neurons In the training phase, we will get the results of COD, BOD and DO. Table 3 shows the system parameters (Inputs) and results (Outputs) in testing stage. To test the results and performance of the system, an interface in the application, evaluates by calculating the following statistical parameter: correlation coefficient (r), root mean square error (RMSE) and mean absolute percentage error (MAPE) defined by Eqs. (3)(4)(5), respectively [8]: Where Q∘ and Q p are the observed and estimated concentrations at the time steps, Mo and Mp are the mean of the observed and estimated concentrations, respectively, and N is the total number of observations of the data set. The RMSE and MAPE measure the errors -however, RMSE is the most popular measure of errors which receives much greater attention than small errors [8], Figure 8 illustrates the system evaluation interface.

Conclusion
When designing and implementing a neural network system, we observed that, there are many constraints, the most important of which is the collection of large amounts of data for the system, because it depends largely on the amount of data entered to the testing algorithm to obtain results comparable to the target results. The system depends very heavily on the data entered. The number of neurons in the hidden layer also has a large role in the accuracy of the results, where increasing the number of neurons in the hidden layer leads to an increase in the weights and is thus reflected positively on the system output. The process of evaluating the system and its results was excellent, as the system was evaluated using correlation coefficient (r), root mean square error (RMSE) and mean absolute percentage error (MAPE).