Soft Sensor Modeling of Key Effluent Parameters in Wastewater Treatment Process Based on SAE-NN

Real-time measurements of key eﬄuent parameters play a highly crucial role in wastewater treatment. In this research work, we propose a soft sensor model based on deep learning which combines stacked autoencoders with neural network (SAE-NN). Firstly, based on experimental data, the secondary variables (easy-to-measure) which have a strong correlation with the bio-chemical oxygen demand (BOD5) are chosen as model inputs. Moreover, stochastic gradient descent (SGD) is used to train each layer of SAE to optimize weight parameters, while a strategy of genetic algorithms to identify the number of neurons in each hidden layer is developed. A soft sensor model is studied to predict the BOD5 in a wastewater treatment plant to evaluate the proposed approach. Interestingly, the experimental results show that the proposed SAE-NN-based soft sensor has a better performance in prediction than the current common methods.


Introduction
Recently, water pollution has been one of the most serious and ongoing problems facing our world. Key variables in wastewater treatment need to be evaluated in order to control pollution and ensure that water emission calculations are up to international standards.
Several methods have been used to calculate the key variables in the treatment of wastewater. However, in the wastewater treatment system, there are a large number of variables that are difficult to measure online, such as BOD5, which is calculated by a normal 5-day off-line delay. is makes it inappropriate for real-time measurement and may lead to effluent quality violations. Soft sensor technology provides a good solution to these problems [1][2][3]. Soft measurement estimates variables that are difficult to measure by correlating them with available variables that are easy to measure. Soft sensors can be categorized into two separate classes, namely, model-driven and data-driven, which can be differentiated on a very general level. e first-principle models are most often soft sensors model-driven family [4]. e model-driven (white-box model) is built based on the deep knowledge of process mechanism backgrounds. But, due to the complicated physical backgrounds and harsh conditions of industrial plants, when using first-principle approaches, it is difficult to model the entire process when that phase is considered. Nevertheless, data-driven (blackbox models) are built on historical data that can be obtained from industrial processes and built without any operational experience or prior knowledge, making it an acceptable choice for soft sensor modeling of complex processes [5]. For the development of data-driven soft sensors, an abundance of multivariate statistical methods and machine learning methods such as Partial Least Squares (PLS), Principal Component Analysis (PCA), Fuzzy Logic, Support Vector Regression (SVR), and Artificial Neural Network (ANN) have been used [6].
e data-driven model is highly sensitive to dimensionality accompanied by a high degree of correlation, which leads to nontrivial correlation since the data are available in a large amount; however, it lacks the robustness. Such an issue can result in poor robustness and instability of soft sensor algorithms, besides a fall in performance prediction. erefore, extracting the most useful information for soft sensor models is a crucial step [7]. e most famous linear algorithms feature representation for discovering data models from two different perspectives are PLS and PCA. In addition, machine learning algorithms such as Support Vector Machine (SVM) and ANN, due to their ability to cope with nonlinearity, have been commonly used in soft sensor modeling. However, with one hidden layer of model structures, these algorithms are considered as shallow learning methods. Shallow learning can be useful for simple processes and can cope with problems due to time-consuming, cost, or technical limitations with the use of a few samples and labeled data that include both input and target values.
us, these approaches are often unsuitable for modern applications when facing highly complex processes. More potential solutions should be developed to deal with these problems. Consequently, in comparison with shallow ones, deep learning with multilayer architectures has better performance in these complex processes.
Deep learning has been widely implemented in natural language processing, image processing, speech recognition, etc. over the past few years [8][9][10][11]. In order to optimize the weight of the deep network, Hinton suggested a greedy layer-wise unsupervised pretraining learning process, making it a good solution and attracting more attention and rapidly developing [12,13]. Recently, in many fields, deep neural networks have been proposed and undoubtedly achieved success. In those complex issues that conventional neural networks cannot properly solve, deep neural networks have shown remarkable performance. It can create more complex features successfully; meanwhile, when learning deep architectures directly, it can prevent the gradient vanishing and exploding problems, causing the gradient-based backpropagation to be unable to run the lower layers in the network [14,15]. Deep learning has also been shown to be particularly appropriate for modeling soft sensor as it is more descriptive than conventional soft sensor models. Qiu et al. used the stacked autoencoders soft sensor to predict BOD5 in the wastewater treatment process. ey showed that when compared to shallow neural networks, a deep neural network can achieve better prediction and generalization efficiency [16]. Wang et al. proposed a datadriven soft sensor model that integrates stacked autoencoders with support vectors regression (SAE-SVR) to estimate the rotor deformation of air preheaters in thermal power plant boiler. ey used the Broyden-Fletcher-Goldfarb-Shanno Limited Memory (L-BFGS) algorithm to optimize weight parameters and the GA to achieve optimum SVR parameters [17]. Yan et al. proposed a deep learning-based soft sensor modeling that integrates denoising autoencoder with a neural network (DAE-NN) to estimate flue gas oxygen content in ultrasuperficial units of 1000 MW. ey used improved gradient descent to update the parameters of the model [18]. Yuan et al. proposed a novel variable-wise weighted stacked autoencoder (VW-SAE) soft sensor for high-level outputrelated feature extraction on an industrial debutanizer column process to estimate product concentration prediction [8]. Liu et al. proposed a stacked autoencoder based deep neural network for achieving gearbox fault diagnosis [19].
In this present research work, we propose a novel soft sensor prediction modeling approach for key parameters of online measurement in wastewater treatment, which combines a deep neural network SAE-NN and the GA. e main contributions of this paper are duly to be summarized as follows. (i) e SAE, which integrates autoencoder (AE) with neural network (NN), has been used for predictive modeling of key BOD5 effluent parameter for on-line monitoring. In order to obtain the SAE, the multilayer AEs achieve coarse tuning through unsupervised learning; then the SAE achieves fine-tuning through supervised learning BP. e problem of nonlinear mapping between the auxiliary variables and the primary variables was better solved. (ii) GA was implemented to determine the number of neurons in each hidden layer, aiming at the issue that the deep neural network structure was difficult to optimize. Consequently, the accuracy of the prediction model was improved by optimizing the network structure. (iii) In order to further raise the performance of the model, the original data set was augmented by resampling and polynomial interpolation, which improved the completeness of the data. e problem of overfitting of the model was alleviated. Our approach is employed for the modeling and prediction of BOD5 in WWTPs. e experimental results showed a better prediction performance by using the proposed soft sensor modeling method based on the combination of SAE-NN and GA for on-line wastewater monitoring.

Stacked Autoencoders. Autoencoder (AE)
is an unsupervised machine learning neural network that aims to turn inputs into outputs with as little distortion as possible; namely, target variables are the matching as input variables. e dimension of the output layer is, therefore, set to be equivalent to the dimension of the input layer. e main differences between AE and multilayer NN are as follows: (i) AE merely requires data input and it will unsupervisely evaluate the output data, while multilayer NN is subject to strict supervision, which means labeled data are needed; (ii) AE is based on dimensionality reduction. is is important if input components include, or are highly correlated, a lot of redundancy. AE is a decoder and an encoder. Figure 1 depicts the AE model's basic structure. Assume that AE inputs are x � [x (1) where d x stands for the dimension of the input.
e inputs x are mapped to the hidden layer h ∈ R d x by function f as where h d stands for the hidden variable vector dimension and W stands for h d × d x weight matrix and b ∈ R d h soft sensor in the decoder; the hidden representation h is mapped to the output layer of x ∈ R d x by mapping function f.
where w stands for a h d × d x weight matrix and b ∈ R d x stands for the output layer bias function. e nonlinear activation functions of r f and r f represent the rectified linear units (ReLU) and can be described as e initial input x of AE is used to be as similar to the reconstructed output x as possible. at is the task that AE attempts to where N represents an overall number of samples in training. By calculating the mean square error, the reconstructed loss function is reduced to obtain the model parameters as By SGD, the autoencoder parameters can be optimized. Within the stacked autoencoder, there are multiple AEs connected layer by layer that can be trained through supervised fine-tuning and layer-wise unsupervised pretraining. Using raw input data, an unsupervised pretraining is used to train the first AE and obtain the trained function vector. e former layer's function vector is used as the output of the next layer, and this process can be repeated by layer pretraining before training the entire SAE layer. e output layer will be applied to the top of the SAE after training all the hidden layers and backpropagation (BP) using the labeled training set to minimize cost function and update weights to achieve supervised fine-tuning.

SAE Parameter Optimization
Algorithm. By using an optimization algorithm, we must optimize weights in the pretraining process for each layer of the AE network; then these are the initial parameters used for a deep AE network. e BP algorithm is the most common method, but training deep backpropagation AE network typically results in lower quality of generalization. at is, the top-layer parameters will simply adapt to fit as much as possible the training data sets, regardless of the lower-layer parameters estimate. We adopt the SGD algorithm in this research work in order to optimize the initial parameters. It is a method of optimization for unconstrained problems with optimization. In SGD, for each iteration, several samples are selected randomly instead of the entire data set [19][20][21][22][23][24][25]. e process of conducting one iteration in each sample update basically depends on random shuffling, and hence the updated model's parameters are estimated by At each iteration, we figure out the cost function gradient of a single example rather than the sum of the cost function gradient of all the examples, so the SGD algorithm has a high speed of execution and can also be used for online learning.

Model Structure Identification Using a Genetic Algorithm
(GA). How to assess network architecture is one of the critical aspects that need to be dealt with for a neural network. In other words, the appropriate number of neurons should be selected for each hidden layer. In this present work, GA is employed to identify the number of neurons in each hidden layer, which is a process of searching and an optimization method that is driven based on the process of natural selection. Generally, it is widely utilized for finding the near-optimal solution for optimization problems with large parameter space. When employing GA, there are two preconditions that have to be realized, a defining chromosome or solution representation and a fitness function to evaluate the solutions. In this work, the root means squared error (RMSE) acts as a fitness value.

Soft Sensor Modeling of Key Effluent Parameter BOD5 Based on SAE-NN.
e main objective of the SAE-NN based on the soft sensor is to take the unlabeled raw data in the soft sensor modeling and to benefit the critical information behind process data. e SAE-NN-based soft sensor structure is shown in Figure 2. First of all, the original data set from the wastewater treatment plant will be analyzed. en, the secondary BOD5 related variables are selected (including labeled data and unlabeled data) that are used to pretrain SAE to have improved neural network initialization trained on labeled data y. Finally, the prediction values of BOD5 are obtained by SAE-NN. e soft sensor proposed in this work has two main parts: supervised learning (classical neural network NN) and an unsupervised pretraining layer (SAE). In applications of large-scale data set such as wastewater treatment data set, three layers of SAE can be utilized through the following steps. Firstly, an AE will be trained to acquire features. Second, learn secondary features after using the primary features as raw input to the next AE. Such procure is repeated to reach the last level of AE. After building the stacking of the encoders, the acquired features are used as raw input to NN regressor. And finally, the data labels are mapped after applying the training process. Eventually, all Decoder x (1) x (1) x (1) h (  layers are merged into stacked autoencoder and a final layer of NN regressor which is capable of regressing the BOD5 key effluent parameter. e SAE-NN soft sensor modeling procedure is summarized as follows: Step 1. Select secondary variables based on process knowledge and data collection and divide them into a training set, validation set, and testing set.
Step 2. Data preprocessing: resampling and interpolating the data set involves changing the time series observation rate and applying data normalization so that all observations are within 0 and 1.
Step 3. Define the deep SAE structure, train the individual AE in an unsupervised pretraining layer and use the SGD algorithm to obtain the optimized weight values, and use the genetic algorithm to determine the optimum number of neurons in each hidden layer.
Step 4. Primarily, the initiation of the weight of SAEs will be considered to launch the supervised neural network and train it on the criterion of supervised training.
Step 5. Test a SAE-NN's performance based on a soft sensor.

Case Study.
In this section, in order to predict the BOD5 in a wastewater treatment plant, the soft sensor model proposed is applied to an actual WWTP. BOD5 is determined by standard, off-line 5-day delay, which plays an essential role in controlling the key effluent indicator and in preventing water body eutrophication. Soft sensor technology provides a good solution for dealing with these problems. Compared to other data-driven modeling approaches, the proposed soft sensor has shown a better performance of prediction.

Case Description.
In the WWTP, basically intended to remove organic matter and nutrients, an activated wastewater treatment plant is commonly used. e influential rate, the performance, and the number of species of microorganisms vary over time, and the process information is very restricted. erefore, due to its climate sensitivity and seasonal changes, an online analyzer appears to be unavailable. Based on the above, the complexity and fluctuations result in deterioration or even failure of the online analyzer performance. e proposed wastewater processing plant [26] is shown in Figure 3 which consists of four essentials: pretreatment, primary settlers, aeration tanks, and secondary settlers.
Firstly, wastewater is processed after primary settlers in the bioreactor tank where the microorganisms decrease the level of the substrate. Secondly, for biomass sled settlement, the sewage water is moved to secondary settlers. us, at the top of the settlers, there is clean water and the sewage processing plant is performed. To retain a sufficient level of biomass, a fraction of the sludge is added to the input of the aeration tank to allow the organic matter to be oxidized and the remaining sludge to be purged. e plant primarily treats sewage flow of 35,000 m 3 /d. A series of device variables, 8 of which are performance indicators, are calculated at several plant locations with the regular measurement of a sensor, giving a set of 38 values per day, 9 of which are percentages of performance. In this work, the behavior of the plant along of 527 days has been considered; individually it involves 38 process variables. However, the data set reduction was done to cope with any missing values for attributes. In other words, all rows with any missing data were removed from the data set, thereby resulting in a data set with 381 instances. It is necessary to select the correct secondary variables in order to achieve high performance because irrelevant variables will deteriorate the soft sensor's prediction performance. Figure 4 shows Pearson's linear correlation. Nineteen process variables were chosen to predict BOD5, including local settlers performance based on SS/COD/BOD5, suspended solids (SS), sediments, biochemical demand for oxygen (BOD), unstable suspended solids, chemical demand for oxygen (COD), input, and global plant performance based on BOD/COD/SS input. en, these nineteen variables were employed as the model soft sensor input, and BOD5 was employed as the model soft sensor output. Table 1 shows the details of secondary variables.

Augmentation Processing and Data Preprocessing.
Because of overfitting, training a deep SAE with small data set would deteriorate SAE's performance; that is, the network is working well on the training set, but the testing set is worth it [27,28]. Augmentation data are used to solve this problem in order to expand the data set and reduce the problem of overfitting [29,30]. e number of samples is increased in the data augmentation method by applying sampling polynomial interpolation to the data set, where we increased the frequency of data set from days to hours and used an interpolation scheme to fill in the new hourly frequency.
To eliminate different scales of data set, all data set is scaled to (0, 1) by min-max scaler according to the following equation: where x * i refers to normalized variables and i refers to the dimension of the data set.

Setting Parameters of the Deep Neural Network.
e performance of the soft sensor is governed by a neuron's number in each hidden layer; this process was not modeled up to date. e genetic algorithm used in this work is to choose the number of neurons in the hidden layer. To evaluate the soft sensor model performance, the root means squared error (RMSE) and the correlation coefficient (R 2 ) are used. e calculation of RMSE is where y i,predict and y i,real are, respectively, the forecast value and the real value for the example; and in the given data set, N refers to the total number of examples.
(1) R 2 is calculated by where y stands for the average of the test set's output values.

Simulation Experiment and Result Analysis
e proposed soft sensor was validated in this study and compared to three conventional soft sensors: SVR (there are a number of core functions like linear kernel, polynomial kernel, sigmoid kernel, RBF kernel, etc., gamma � scale (hard no limit on iterations within solver) max_iter � −1), PCA-SVR combining PCA (number of components � 10) and support vector regression, and NN with three hidden layers (activation � relu, optimizer � sgd, momentum for gradient descent update � 0.9, and initial learning rate-� 0.001). e data set itself will be used to model training for comparison purposes in order to ensure a fair comparison. e number of neurons in each hidden layer is determined experimentally as 13, 13, 13, and the regularization parameters C and ε are 5 and 0,022, respectively, based on the linear kernel function of SVR, using the GA. 12749 samples have been utilized as samples for training (the initial training samples are divided into new training samples and validation samples, of which 30% were initial training samples) and 3188 are used as testing samples. In order to obtain the SAE, an unsupervised pretraining layer-wise method is utilized to achieve a good initialization for the weights and bias of each AE. Every AE is equipped with the SGD algorithm for batch normalization to speed up training, and Dropout is used for regularization, setting the batch size as 512 samples. Every AE is trained iteratively by 50 epochs, in addition to the supervised backpropagation-trained finetuning.
e size of the batch is set as 128 samples and  Table 2 describes the predictive performance of the soft sensor based on different approaches. e results of PCA-SVR, SVR, multilayer NN, and the proposed soft sensor based on SAE-NN are predicted in Table 2 on the training and testing data set. As can be seen, the SAE-NN-based soft sensor has much better performance in learning and generalization than other conventional soft sensors and given a fairly satisfactory estimate of BOD5, while PCA-SVR, SVR, and multilayer NN obtained relatively poor results. e SVR had the worst results because it was unable to adequately describe the nonlinear structure data.
e PCA-SVR model achieved slightly better predictive results as PCA can eliminate input data from noise and data redundancy to improve predictive performance. More accurately than PCA-SVR and SVR,  Biological oxygen demand input for primary settlers BDO-P 7 Suspended solids input into primary settlers SS-P 8 Biological oxygen demand input for secondary settlers DBO-D 9 Chemical oxygen demand input for secondary settlers DQO-D 10 pH output pH-S 11 Biological oxygen demand output DBO-S 12 Chemical oxygen demand output DQO-S 13 Input performance biological oxygen demand in primary settlers RD-DBO-P 14 Input Journal of Control Science and Engineering traditional multilayer NN can approximate the complex data relationship. Nonetheless, NN with 3 hidden layers does not provide great predictive performance compared to SAE-NN. e network parameters for the multilayer NN have been randomly initialized and local optima are easily disposed to it. at is, training NN with BP results in a slow rate of convergence and difficulties in deciding an appropriate architecture to achieve a minimum. In contrast, layer by layer can be extracted from high-level abstract features in SAE. erefore, for tasks of prediction, these features are much more structured.
at is, the performance of SAE-NN is better than that of multilayer NN. e comparative results of the other traditional soft sensor are shown in Figures 6-8. It can be observed from Figures 6-8 that from estimating the BOD5 in the WWTP plant, this explicitly shows the fact that soft sensor based on SAE-NN has good performance. at is, from tracking the BOD5's varying trend, SAE-NN performs well. e prediction errors of SAE-NN are smaller than those of the other, as can be seen. at is, the SAE-NN forecast easily shifts without significant variations and displays greater robustness than models of shallow architecture predictions. e experiments were performed on a PC using Intel ® Core ™ i5-8250U CPU @1.60 GHz (8 CPUs)∼1.8 GHz, 4 GB RAM using the Keras Python deep learning library (using Ten-sorFlow backend) 2.2.4. [31].

Conclusions
In this paper, in a wastewater treatment plant, SAE-NN based on the data-driven soft sensor is proposed and implemented to estimate the BOD5. e stacked AEs have been trained for the supervised NN to obtain initialization weights, which resulted in the best generalization of the NN system and avoiding the issue of overfitting. In addition, in each hidden layer, GA was developed to determine the appropriate number of neurons. Generally, the soft sensor output is estimated by approximating the real values of BOD5. In most cases, the SAE-NN-based soft sensor outperforms all additional tools regarding the soft sensors. Deep learning in many industrial process applications is superior to shallow learning when faced with complex situations and is a promising approach for modeling soft sensors. Automatically selecting the appropriate parameter values to improve the performance of the deep network will be the focus in future work. Further future work will also extend our approach to a pretraining layer-wise manner which is supervised or semisupervised.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.