Adsorption of tetracycline antibiotic onto modified zeolite: Experimental investigation and modeling

Artificial Neural Networks (ANNs) model and Adaptive Neuro-Fuzzy Inference System (ANFIS) were used to estimate and predict the removal efficiency of tetracycline (TC) using the adsorption process from aqueous solutions. The obtained results demonstrated that the optimum condition for removal efficiency of TC were 1.5 g L−1 modified zeolite (MZ), pH of 8.0, initial TC concentration of 10.0 mg L−1, and reaction time of 60 min. Among the different back-propagation algorithms, the Marquardt–Levenberg learning algorithm was selected for ANN Model. The log sigmoid transfer function (log sig) at the hidden layer with ten neurons in the first layer and a linear transfer function were used for prediction of the removal efficiency. Accordingly, a correlation coefficient, mean square error, and absolute error percentage of 0.9331, 0.0017, and 0.56% were obtained for the total dataset, respectively. The results revealed that the ANN has great performance in predicting the removal efficiency of TC.• ANNs used to estimate and predict tetracycline antibiotic removal using the adsorption process from aqueous solutions.• The model's predictive performance evaluated by MSE, MAPE, and R2.


Adsorption experiments and instrumental analyses
Batch experiments were conducted in a plexiglass reactor that contained a 100 mL sample of specified TC concentration at room temperature (22 ± 2 °C). After pH adjustment at the desired level, the required amount of modified zeolite (MZ) was added to the reactor and blended at 100 rpm. At the end of the given reaction time, the suspension separated by was centrifuged at 30 0 0 rpm for 5 min and the supernatant analyzed for residual TC. The fortified distilled water (distilled water containing TC) was prepared by adding a certain amount of TC into the distilled water, whereas the real wastewater was obtained by sampling the hospital wastewater.
The concentration of TC after each run measured by KNAUER HPLC using a C 18 column (150 mm, 4.6 mm, 5 μm). The mobile phase consists of 75% oxalic acid (10 mM), 13% of methanol, and 12% acetonitrile with a flow rate of 1 mL min −1 . The temperature of the oven and the wavelength of the UV detector set to 30 °C and 360 nm, respectively [1] .
The removal efficiency of TC through the adsorption process was calculated by Eq. (1) [2] : where C 0 and C t (mg L -1 ) denote the TC concentration before adsorption and at the time t , respectively.

Description of ANN method
The data was prepared to enter into the ANN model. The Matlab (version 2014b) was used for computer coding all the calculations required. The best model predicted including the best learning and experimental set, normalizing the data, the number of hidden layers and neurons per layer, the functions of transfer, conversion, and operation, as well as the learning rate, the number of repetitions and the algorithm of teaching. In this study, the data set selected randomly for train and validation was 74%, 26%, respectively. The data analysis was performed by Microsoft office 2010 software. There are no systematic methods in determining the mentioned cases, so the best design of the network is achieved through experience and Trial and error. The schematic of the ANN model used in the current study presented in Fig. 1 .
The training flowchart of ANN network has three layers: an input layer, an output layer and an hidden layer. The inputs to a neuron comprise its bias and the sum of its weighted input. The output value of a neuron depends on the neuron's inputs as well as transfer function [3] . In mathematical terms, a neuron can be described as follows: where x 1 , . . . , x m are the inputs; w k 1 , . . . , w km are the weights of neuron; u k is the linear combiner output due to the input signals; ϕ is the activation function(for this paper used Log-sigmoid); b k is the bias or threshold; and y k is the output signal of the neuron.

Description of ANFIS method
Adaptive Neuro-Fuzzy Inference System (ANFIS) combines the learning capacities of artificial neural networks (ANNs) and reasoning capacities of fuzzy systems [4] . The learning algorithm for ANFIS is a hybrid algorithm, which is a combination of gradient descent and the least-squares method [5] . The consequent parameters were optimized under the condition that the premise parameters remain fixed. The main benefit of the hybrid approach is that it converges much faster since it reduces the search space dimensions of the original pure back-propagation method used in neural networks. The ANFIS structure and research procedure are presented in Figs. 2 and 3 , respectively [6 , 7] .

Statistical modeling
To achieve acceptable results through the neural network, the selection of appropriate data as inputs and outputs of the network is required. In the current work, 30 experimental data were used for modeling. Different modes were investigated to determine the best model used. Where, the best model was selected using the highest correlation coefficient. In the current work, a total of 30 run experiments were used for the models. To determine the best input pattern for the network, the effective variables on the removal efficiency were data processed, calculated and matrixed. The  model included an input layer with four neurons (TC initial concentration, pH, adsorbent dosage, and reaction time), a hidden layer with six different states, and an output layer with a neuron (removal efficiency of TC). Before starting the learning process, to avoid some educational damages and to ensure that the network not be saturated with large numerical amounts of weight, the normalization of the input data in the range from zero to one calculated completely randomized by Eq. (4) . where X norm denotes the value of the normalized data and X, X max and X min denote the amount of input, maximum and minimum data, respectively. Finally, for comparison, the normalized data returned to the original X data after modeling according to Eq. (5) .
At the learning phase of the network, there are various criteria for stopping the neural networks. These criteria include the computational time, the error rate, and the number of steps to apply the data to the network. Relationships such as the mean correlation coefficient (R 2 ), mean square error (MSE), and mean absolute percentage error (MAPE) were used to calculate the error rate. The correlation coefficient was expressed by Eq. (6) as follows: where y i , y di , and n denote the predicted data, the real data, and the total number of data, respectively.
To evaluate the prediction performance of the developed model, the statistical standards including mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE) and mean absolute percentage error (MAPE) were defined as below: Mean absolute error(MAE) = Where y i is the actual value and y i is the predicted value for the train and test.

Acquired data from developed ANN model
The most important factors in the modeling of the neural networks which have a direct effect on the model output are the main variables (the number of hidden layers and number of neurons). Therefore, the modeling process was initiated from a hidden layer, in which a neuron with six different states were selected. As seen from Table 1 , the correlation coefficient for training, validation, and total data was calculated for each state and the highest correlation coefficient were used (10-10).
After performing the training and validating phases of the neural networks, the algorithm was designed to be able to examine all possible scenarios for the number of layers, the number of neurons per layer, different transfer functions, and also different training algorithms. 74% and 26% of input data were used for network training and validation data, respectively, and a lower coefficient was applied to the error value derived from training data so that the results are less affected by this parameter. Validation and training samples must be different and are selected randomly from the original data  set, which used the "randperm" function in Matlab. The Levenberg-Marquardt algorithm with the least error in training and the goal value of 10-5 was selected as the best backpropagation algorithm. The configuration parameters of this network were obtained by the trial and error method. The performances of these networks were evaluated by error and regression parameters and correlation coefficient. The network with the least deviation from the target values for both training data and network validation data was offered as a model. The modeling data against the experimental data was depicted in Fig. 4 . As seen from the comparison of simulated values with real values, the artificial neural network models were presented in the current study as advanced tools with high potential could well predict the removal efficiency of TC.

Acquired data from developed ANFIS model
Two models of Sugeno, with automatic extraction of data from AFIS [GENFIS2], were used. The MATLAB software was adopted for comparison purposes. Moreover, the coverage threshold fixed to 0.01. Table 2 shows the used ANFIS information in the current study with the Back-propagation optimum method for the removal efficiency of TC.  In Figs. 4 and 5 , the desired values (removal efficiency of TC) versus ANFIS predictions for the verification data points were demonstrated. These figures reveal that an acceptable agreement between the predicted and experimental data achieved.

Comparison of ANN and ANFIS models
For the ANN model, the low values of the conventional error functions: MAE = 2.832, RMSE = 4.117, and MAPE = 0.046 are in an acceptable range. Besides, the correlation coefficients are greater than 0.9331, which indicated a satisfactory adjustment between the experimental data and those predicted by ANN model. While for the ANFIS model, low values of MAE = 0.759, RMSE = 2.065, and MAPE = 0.013, are in agreement with the reported range in the literature [8][9][10][11] . Furthermore, the correlation coefficient than found by ANN is closer to 1.00, which implies excellent agreement between the experimental data and those predicted by the ANN model. Table 3 listed the values of MAE, RMSE, MAPE and correlation coefficient (R 2 ) of ANN and ANFIS models for prediction of TC removal efficiency.
The predicted removal efficiency of TC by ANN and ANFIS models were compared with the experimental results in Fig. 5 .

Treatment of real wastewater and validity of the model
The removal efficiency of the treatment process through the adsorption of TC onto the modified zeolite was investigated for both fortified distilled water and real wastewater samples in five various operating conditions with superior desirability of 99%. All experiments were performed in triplicate. As seen in Table 4 , a satisfactory agreement was observed between the TC removal efficiency in fortified distilled water and real samples, however, a slight decrease in the efficiency was observed in real samples due to their complex matrix. Desirability is one of the most frequently used response optimization parameters employed for the analysis of experiments to evaluates the extent of access to the optimal conditions and ranging from 0 to 1 (least to most desirable, respectively) [12 , 13] . To appraise the reliability of optimum conditions predicted by the empirical model, a set of experiments was carried out testing the model. It is found that the deviation is smaller between the fortified distilled water sample and real sample, and this result further confirms the validity of the model.

Conclusion
In the current work, the neural network models successfully were developed to simulate the removal efficiency of TC in aqueous solution. The best network structure based on the lowest error found to be the Levenberg-Marquardt algorithm, which included four inputs, one output and ten neurons in the first layer. In the training phase, the neural network learns the relationship between the variables with the help of input data. In the validation phase, the prediction of the data values calculated as output. Consequently, the predicted results were compared with the actual values and the error rate was calculated. The error values should be at their lowest levels, for which purpose the network first designed and the training and validation practice were repeated several times to minimize the error. To evaluate and compare the results provided by the methods and error estimation, the mean square error (MSE) and mean absolute percentage error (MAPE) were calculated. Since the accuracy of the obtained results from the neural networks directly related to the accuracy and range of numbers used in the validation and network training phases, therefore, the reliable and high-precision data were used in the mentioned phases. Moreover, due to the high speed of computing in neural networks, this model was employed as an efficient model for predicting data in environmental sciences.

Declaration of Competing Interest
The authors declare that they have no known competing for financial interests or personal relationships that could have appeared to influence the work reported in this paper.