Data on estimation for sodium absorption ratio: Using artificial neural network and multiple linear regressions

In this article the data of the groundwater quality of Aras catchment area were investigated for estimating the sodium absorption ratio (SAR) in the years 2010–2014. The artificial neural network (ANN) is defined as a system of processor elements, called neurons, which create a network by a set of weights. In the present data article, a 3-layer MLP neural network including a hidden layer, an input layer and an output layer had been designed. The number of neurons in the input and output layers of network was considered to be 4 and 1, respectively, due to having four input variables (including: pH, sulfate, chloride and electrical conductivity (EC)) and only one output variable (sodium absorption ratio). The impact of pH, sulfate, chloride and EC were estimated to be 11.34%, 72.22%, 94% and 91%, respectively. ANN and multiple linear regression methods were used to estimate the rate of sodium absorption ratio of groundwater resources of the Aras catchment area. The data of both methods were compared with the model׳s performance evaluation criteria, namely root mean square error (RMSE), mean absolute error (%) and correlation coefficient. The data showed that ANN is a helpful and exact tool for predicting the amount SAR in groundwater resources of Aras catchment area and these results are not comparable with the results of multiple linear regressions.


a b s t r a c t
In this article the data of the groundwater quality of Aras catchment area were investigated for estimating the sodium absorption ratio (SAR) in the years 2010-2014. The artificial neural network (ANN) is defined as a system of processor elements, called neurons, which create a network by a set of weights. In the present data article, a 3-layer MLP neural network including a hidden layer, an input layer and an output layer had been designed. The number of neurons in the input and output layers of network was considered to be 4 and 1, respectively, due to having four input variables (including: pH, sulfate, chloride and electrical conductivity (EC)) and only one output variable (sodium absorption ratio). The impact of pH, sulfate, chloride and EC were estimated to be 11.34%, 72.22%, 94% and 91%, respectively. ANN and multiple linear regression methods were used to estimate the rate of sodium absorption ratio of groundwater resources of the Aras catchment area. The data of both methods were compared with the model's performance evaluation criteria, namely root mean square error (RMSE), mean absolute error (%) and

Value of the data
The data of this article can be used to environmental management and better exploitation of groundwater resources.
Considering the present data, many of the sampling drinking water supply reservoirs need to pay attention to achieve Iran national water quality standards.
The results clearly indicate that with appropriate selection of input variables, artificial neural network and multiple linear regressions as a soft computing approach can be used to estimate water quality indices properly and reliability.

Data
Two algorithms, including seven Back-propagation algorithms and Lewenberg-Markow, have been used in this data article. The Comparison of the performance of seven Back-propagation algorithms in estimating the sodium absorption ratio with the number of neurons 10 in the hidden layer has been shown in Tables 1 and 2, indicating the comparison of the different neurons performance in the hidden layer in estimating the sodium absorption ratio using the Lewenberg-Markow algorithm. The optimized output of neural network and data performance criteria for has been shown in Figs. 1 and 2. Also, Fig. 3 indicates the actual SAR values in groundwater resources and their predicted values via multiple linear regression.

Study area description
Aras Catchment area is a plain located in the northern half of the East Azerbaijan province, West Azerbaijan province and Ardabil province. Extensive precipitation in this region, in addition to its impact on climate moderation, has created numerous rivers [18]. (Fig. 4).

Material and methods
Data on groundwater resources were collected a during the years 2010-2014 and water samples were analyzed following the standard methods for examination of water and waste water [1][2][3][4][5][6][7][8][9][10] in terms of estimation of sodium absorption ratio (SAR). A two-layer neural network with a tangentsigmoid transfer function for the hidden layer and a linear transfer function for the output layer was Table 2 Comparison of the different neurons performance in the hidden layer in estimating the sodium absorption ratio using the Lewenberg-Markow algorithm. used. The input parameters of the neural network included sulfate, chloride, electrical conductivity (EC) and pH, and the sodium absorption ratio (SAR) were considered as the network output parameter. The data on these parameters were divided into training, testing and data validation. 70% of these data was used for training, 15% of data for validation and other 15% for testing. Considering that today BP neural networks have become a common tool for modeling environmental systems, so in this study, 8 BP algorithms were selected and their results were tested to obtain the best algorithm. For all algorithms, a dual layer network with a tan-sigmoid transfer function on the hidden layer and a linear transfer function in the output layer was used. In choosing the best BP algorithm, the number of neurons was considered 10. The results of the model's performance with their BP algorithm are presented in Table 1. The performance of the BP algorithm was evaluated with mean squared error (MSE), mean absolute error (MAE) and correlation coefficient (R) between the output of the models and the actual data set. The algorithm with the least training error and the maximum correlation coefficient was selected as the most suitable algorithm. The Langenberg-Marquard algorithm (trainlm) was chosen as the best algorithm to predict the sodium adsorption ratio (SAR). To optimize the number of neurons after selecting the best BP algorithm, namely Levenberg-Marquard, the  number of neurons was optimized by keeping other parameters intact. As shown in Table 2, in the number of neurons greater than the optimal number of neurons, 10, the mean square error (MSE) was not significantly altered. Therefore, all the modeling steps were done based on the number of neurons 10 and the Lewenberg-Markow algorithm to predict the sodium absorption ratio [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20].

Transparency document. Supplementary material
Transparency document associated with this article can be found in the online version at http://dx. doi.org/10.1016/j.dib.2018.08.205.