Next Article in Journal
Anisotropic Flows of Charmonium in the Relativistic Heavy-Ion Collisions
Next Article in Special Issue
Modified Sand Cat Swarm Optimization Algorithm for Solving Constrained Engineering Optimization Problems
Previous Article in Journal
Improved Large Covariance Matrix Estimation Based on Efficient Convex Combination and Its Application in Portfolio Optimization
Previous Article in Special Issue
Machine Learning Combined with Restriction Enzyme Mining Assists in the Design of Multi-Point Mutagenic Primers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction Model of Wastewater Pollutant Indicators Based on Combined Normalized Codec

1
School of Light Industry, Beijing Technology and Business University, Beijing 100048, China
2
Artificial Intelligence College, Beijing Technology and Business University, Beijing 100048, China
3
China Light Industry Key Laboratory of Industrial Internet and Big Data, Beijing Technology and Business University, Beijing 100048, China
4
Department of Computer Science and Engineering, ITM SLS Baroda University, Vadodara 391510, India
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(22), 4283; https://doi.org/10.3390/math10224283
Submission received: 29 September 2022 / Revised: 5 November 2022 / Accepted: 8 November 2022 / Published: 16 November 2022
(This article belongs to the Special Issue Computational Intelligence Methods in Bioinformatics)

Abstract

:
Effective prediction of wastewater treatment is beneficial for precise control of wastewater treatment processes. The nonlinearity of pollutant indicators such as chemical oxygen demand (COD) and total phosphorus (TP) makes the model difficult to fit and has low prediction accuracy. The classical deep learning methods have been shown to perform nonlinear modeling. However, there are enormous numerical differences between multi-dimensional data in the prediction problem of wastewater treatment, such as COD above 3000 mg/L and TP around 30 mg/L. It will make current normalization methods challenging to handle effectively, leading to the training failing to converge and the gradient disappearing or exploding. This paper proposes a multi-factor prediction model based on deep learning. The model consists of a combined normalization layer and a codec. The combined normalization layer combines the advantages of three normalization calculation methods: z-score, Interval, and Max, which can realize the adaptive processing of multi-factor data, fully retain the characteristics of the data, and finally cooperate with the codec to learn the data characteristics and output the prediction results. Experiments show that the proposed model can overcome data differences and complex nonlinearity in predicting industrial wastewater pollutant indicators and achieve better prediction accuracy than classical models.

1. Introduction

In order to protect water resources and reduce the pollution of production and domestic wastewater to the environment, it is necessary to reduce the discharge of pollutants through the harmless treatment of wastewater [1]. Therefore, the effect of wastewater treatment has received extensive attention, and innovative technologies and management methods have become a current research focus.
Anaerobic biological treatment technology, also known as anaerobic digestion (AD), is widely used in the sewage treatment link of wastewater treatment plants (WWTPs) [2]. Its processing cost includes anaerobic granular sludge (AnGS) bed reactors, e.g., the up-flow anaerobic sludge blanket (UASB) reactor, the expanded granular sludge bed (EGSB) reactor, and the internal circulation (IC) reactor [3], etc. Due to the complexity of sludge composition, its application has limitations, mainly in the inability to fully use functional anaerobic microorganisms, resulting in a slow hydrolysis rate and poor biodegradability [4]. Although ultrasonic irradiation and other methods can improve the efficiency of anaerobic treatment, improper use of parameters will inhibit sludge metabolism and affect the economy of wastewater treatment [5]. Moreover, the anaerobic biological action in the reactor is vulnerable to the impact of influent water, and thus the action is reduced. For example, during heavy rain, the anaerobic biological reactor is under a high hydraulic load to treat low-concentration sewage. It will lead to a period of famine.
In the case of industrial wastewater, the influent composition and flow rate are more prone to large fluctuations or even complete disruptions, affecting the microbial activity and the treatment capacity of wastewater treatment systems [2]. A large amount of surplus sludge will be discharged with the effluent, affecting the environment [6,7]. Therefore, effectively removing sludge from anaerobic reactors or reducing sludge production has become an essential topic in recent years. Another disadvantage is that the removal rate of nitrogen and phosphorus is low. The enhancement of endogenous microbial metabolism will also promote the release of nutrients such as nitrogen and phosphorus in microbial cells, increasing the nutrients in the water and affecting the removal efficiency of nitrogen and phosphorus [7].
Anaerobic/aerobic conditions (A/O) biological nitrogen removal process is a biological sewage treatment system composed of anoxia and aerobic reaction. After the sewage enters the anoxic pool, it successively goes through the stages of anoxic denitrification, aerobic removal of organic matter, and nitrification. The advantages of the A/O process are lower operating costs, higher organic matter removal efficiency, less aerobic sludge, and no pH correction [8]. In the cycle of aerobic sludge treatment, the endogenous respiration rate is high, so the content of aerobic sludge in the effluent is small [7].
Aerobic/anoxic/anaerobic conditions(A/A/O), an anoxic tank was added to the A/O. Part of the mixed liquid from the aerobic tank was returned to the front of the anoxic tank to achieve the purpose of nitrification and denitrification. It can keep the function of nitrogen and phosphorus removal of activated sludge to the maximum extent. Moreover, the standby time is greatly improved, quickly recovering the activity when the wastewater is fed back [9]. This combination process combines the advantages of each of the three reactors. The combined process is more energy efficient. Although most chemical oxygen demand (COD) and suspended solids can be removed under anaerobic and anoxic conditions, the aerobic process can further reduce the concentration of pollutants in the wastewater [10].
China has strict discharge standards for wastewater pollutants and has limited water quality indicators such as COD and suspended solids (SS) in the treated wastewater. Take the beer industry pollutant discharge standard (GB19821-2005) [11] as an example: COD, SS, total nitrogen (TN), and total phosphorus (TP) should be lower than 80 mg/L, 70 mg/L, 15 mg/L, and 3 mg/L respectively. To ensure that the wastewater can be discharged up to the standard, some studies consider using the time series prediction method to model and predict the COD and other indicators at historical moments to provide a basis for adjusting treatment strategies.
The modeling methods commonly used in current research include machine learning [12] and deep learning models [13]. Machine learning methods, such as K-nearest neighbor (KNN), artificial neural network (ANN), etc., have the advantages of convenient modeling and few parameters and have specific applications in some simple prediction tasks. However, in the face of multi-factor and complex nonlinear data, its prediction accuracy is difficult to meet expectations. Deep learning methods are currently the most widely used methods, mainly including recurrent neural network (RNN), long short-term memory (LSTM) neural network, and so on. Deep learning is a method that relies on big data for modeling, and it often achieves better results than other classical methods in robust nonlinear and stochastic modeling tasks [14].
However, in the prediction of wastewater treatment indicators, classical deep learning methods also face some difficulties [15]. The first is the difficulty of data processing. Because wastewater treatment requires multi-factor forecasting, many forecasted indicators and the values of each indicator vary greatly, making it difficult for a single normalization method to achieve sound treatment effects for all indicators. The second is the high data complexity. In prediction tasks, it is often necessary to learn from long historical data, coupled with the nonlinearity and strong randomness of the data, which seriously affects the model’s prediction accuracy.
The solution to this problem is to modify the normalization processing part of the model so that the data can be reasonably limited to a specific range, reducing the complexity of the data, and speeding up the convergence of the model. The current research considers adaptive normalization layer, automatic selection of normalization layer, etc., and adopts a data-driven way to select a suitable normalization method adaptively. However, these improvements are primarily for univariate forecasting, and the final calculation method is still one. In the prediction task with multiple factors and significant data differences, it is practical to consider multiple normalization processing methods.
In summary, this paper considers a combined normalization codec (CNC) model for predicting water quality indicators in wastewater treatment. The model consists of a combined normalization layer, a renormalization layer, and a codec. The advantages of the processing method can be improved to improve the model’s prediction accuracy.
The main contributions of this paper are summarized as follows:
(1)
A combined normalized encoder structure is proposed for the multi-factor prediction problem of wastewater pollutant indicators. This structure combines the advantages of three normalization methods, which can adaptively normalize and encode pollutant index data of different magnitudes, simplify complex index data processing processes, and improve the data processing capability in multi-factor prediction.
(2)
A combined renormalized decoder structure is proposed for the prediction task. The structure uses three renormalization methods to adaptively renormalize the output value of the decoder and map to obtain the actual prediction result. Its feature of adaptively adjusting parameters in model optimization can improve model prediction accuracy.

2. Related Work

Currently, some studies use machine learning methods to predict the quality of wastewater treatment. Arismendy et al. [16] developed an intelligent system based on multilayer perceptrons. The system can predict the COD index to support the relevant decision-making of the sewage treatment plant. Hilal et al. [17] used the model combining KNN and extreme learning machine (ELM) to predict the SS index, and the prediction accuracy reached 93.56%. Liu et al. [18] used the least squares support vector machine (LS-SVM) to build a prediction model, which was validated in the COD prediction of an anaerobic wastewater treatment system. These models based on machine learning can complete the prediction of water quality indicators in practice but generally target a single factor. Because the models are relatively simple, the prediction accuracy still needs to be improved.
Therefore, there are studies considering prediction models based on deep learning. Han et al. [19] used an adaptive fuzzy neural network to achieve multi-objective predictive control. They dealt with conflicting control objectives by capturing the nonlinear behavior of the sewage treatment plant to improve its operational performance of the sewage treatment plant. Farhi et al. [20] used LSTM to build a wastewater prediction model, which showed better results than machine learning in predicting ammonia and nitrate concentrations in wastewater. Wan et al. [21] comprehensively considered spatial, temporal, and probabilistic reliability, and used convolutional neural network (CNN), shared-weight long short-term memory (SWLSTM), and Gaussian process regression (GPR) to jointly build a model to predict water quality. And it is applied to high-precision point prediction and interval prediction monitoring of papermaking wastewater treatment systems.
These applications demonstrate the superiority of deep learning methods in wastewater treatment quality prediction. However, with the increase in pollutant index modeling needs and training data, deep learning methods also expose some problems. When faced with multiple factors and numerical differences, due to the enormous amount of training data, the existing data processing methods are complicated to operate and difficult to meet the processing requirements. Studies have shown improper normalization can significantly affect model performance, reducing model generalization and prediction accuracy [22]. Therefore, more efficient data processing methods must be adopted to cope with the growing demand for forecasting [23].
Passalis et al. [24] combined the z-score normalization method with a neural layer to design an adaptive normalization layer and applied it to the field of time series forecasting. The model adaptive optimization method can achieve better processing results than a fixed normalization scheme. Since this study only considers one basic normalization method, it is challenging to adapt widely to multiple forecasting scenarios. Jin et al. [25] combined z-score, Interval, decimal, and Min-Max normalization methods to design the normalization layer and renormalization layer and obtained the best predictions for a greenhouse weather dataset.
Based on the above analysis, this paper proposes the CNC model in combination with the actual characteristics of the deep learning state estimation method. In this paper, the combined normalization method is adopted, the advantages of various normalization methods are integrated, the data processing effect is improved, and the normalization layer and renormalization layer for the prediction task of wastewater treatment indicators are designed.

3. Combined Normalized Codec Prediction Model

The structure of the proposed combined normalized codec prediction model is shown in Figure 1. The model contains a variety of data normalization methods, which can adaptively integrate the advantages of multiple data processing methods through the end-to-end model optimization process. Thereby, the learning effect of the model on multi-dimensional data is improved, and the purpose of improving the prediction accuracy is finally achieved.
The CNC model comprises three parts: combined normalization encoder, attention mechanism [26], and combined renormalization decoder. The combined normalization encoder integrates an adaptive combined normalization layer containing three normalization calculation methods: z-score [27], Interval [25], and Max [28] normalization. During the model training process, the unprocessed pollutant indicator data are directly input into the adaptive combined normalization layer in batches. Three normalized calculations are obtained by separately obtaining the batch data’s mean, variance, and other statistics. In order to synthesize the advantages of the three calculation methods and get the optimal processing effect, the results of the three normalization calculations are weighted and selected based on the Softmax function [29]. The weights are obtained from the model training to finally generate the weighted normalized processing results. These results are scaled and panned by the learnable parameters α and β that can be dynamically adjusted according to the current model training effect. The exponential weighted average method is used to fit the global distribution of the data, and the iterative estimation is performed according to the statistics of each batch of data. The optimal global statistics are retained, and the prediction accuracy of the data by the final training model is improved. The normalized data are encoded by a multilayer LSTM [30].
The attention mechanism [27] focuses on the encoded features, selecting the most favorable traits for the model output values and ignoring the unimportant ones, thus reducing the model’s internal parameters, and learning more distant historical information. The features filtered by the attention mechanism are fed into the combined renormalization decoder.
The combined renormalization decoder decodes the data features. The decoding of features is mainly achieved by multilayer LSTMs containing sophisticated gating mechanisms that preserve and learn long-term information about the sequence. After decoding the prediction values, the final prediction values are output through the adaptive combined renormalization layer. Corresponding to the adaptive combined normalization layer, this layer contains three renormalization algorithms, which respectively perform renormalization calculation on the output features of the LSTM according to the statistics during data normalization. This layer also uses the Softmax function [29] to weigh the three sets of renormalized results and comprehensively considers the three sets of results through the trainable combined weights to obtain the best estimation results. Moreover, this layer adds similar trainable parameters λ and ν to correct the results, and the values of λ and ν can also be trained by backpropagation. The structure of the combined normalization encoder and the combined renormalization decoder is described below.

3.1. Combined Normalized Encoder

The schematic structure of the combined normalized encoder is shown in Figure 2. The combined normalization encoder integrates the combined normalization layer on top of the conventional encoder. It can combine the computational results of multiple normalizations by improving the effect of normalization processing and ultimately improving the feature encoding capability of the encoder. There are three normalization methods used in the combined normalization layer, including z-score [27], Interval [25], and Max [28], which are calculated as:
x ^ = x m e a n σ 2 + Δ
x ^ = a + ( b - a ) ( x - m i n ) m a x - m i n
x ^ = x | x | m a x
where x represents the source data, x ^ represents the calculation result. min, max, mean, and σ2 represent the minimum, maximum, mean, and variance of the source data, respectively, and a, b represents the normalized interval. Δ represents a fixed, smaller positive number.
Each of the three normalizations has its strengths and can process the input data to the standard normal distribution, (a, b) specific interval, and between (−1, 1), respectively, to exert different effects on the data. Among them, z-score [27] processing can obtain data conforming to the standard normal distribution and reduce data distribution differences [31]; Interval method [25] processing fixes the results in a specific interval to prevent gradient disappearance and gradient explosion problems; Max [28] is scaling normalization scales down the input data without changing the scale characteristics of the input data.
In order to use the effect of the three normalization methods on the input data, this paper uses the adaptive combined normalization method to weigh the calculation results of normalization and determine the most suitable normalization calculation method. In the combined normalization layer, the Softmax function [29] acts as a combined function and is calculated as follows:
Softmax ( t i ) = e t i i = 1 n e t i
where t is the trainable parameter. It can optimize end-to-end by error backpropagation and is dynamically adjusted according to the model training effect. In this paper, three trainable parameters are set to output the combined weights for the results of the three normalization calculations to enhance the effectiveness of the combined normalization method. The calculation formula for combining using the Softmax function [29] is:
X = Softmax ( t 1 ) x 1 + Softmax ( t 2 ) x 2 + Softmax ( t 3 ) x 3
where t 1 , t 2 , and t 3 denote the three selected trainable parameters, x 1 , x 2 , and x 3 denote the results obtained from the three normalization calculations, Softmax means the Softmax function [29], X represents the final output, and means matrix multiplication.
In order to make the output of combined normalization better adaptable to complex data, in this paper, the trainable parameters α and β are used as scaling and translation factors, respectively. These two parameters can be updated with the training process of the model to better correct the calculation results. The output of the combined normalization method is adjusted according to the training effect. The trainable parameters are calculated as:
Y = α X + β
where Y represents the final output of the normalized layer of the batch, X denotes the value of the batch after normalization calculation, α and β are correction parameters. Finally, the combined normalized output adjusted by trainable parameters is encoded by an encoding structure composed of LSTMs to obtain the encoded features.
In the model training, in order to grasp the global distribution of the data according to the batch data and ensure the fitting effect of the model to the input data at the end of the training, this paper uses the exponential weighted moving average (EWMA) method [32] to iteratively estimate the statistics of each batch and record the optimal statistical distribution. It is calculated as:
r u n n i n g _ m i n t = m * r u n n i n g _ m i n t 1 + ( 1 k ) * m i n t r u n n i n g _ m a x t = m * r u n n i n g _ m a x t 1 + ( 1 k ) * m a x t r u n n i n g _ m e a n t = m * r u n n i n g _ m e a n t 1 + ( 1 k ) * m e a n t r u n n i n g _ σ t 2 = m * r u n n i n g _ σ t 1 2 + ( 1 k ) * σ t 2
where m i n t , m a x t , m e a n t , and σ t 2 denote the minimum, maximum, mean, and variance statistics of the batch data at the moment t. r u n n i n g _ m i n t and r u n n i n g _ m i n t 1 denote the estimates of the minimum value at the moment with t and t − 1, r u n n i n g _ m a x t and r u n n i n g _ m a x t 1 denote the estimates of the maximum value at the moment with t and t−1, r u n n i n g _ m e a n t and r u n n i n g _ m e a n t 1 denote the estimates of the mean value at the moment with t and t−1, r u n n i n g _ σ t 2 and r u n n i n g _ σ t 1 2 denote the estimates of the variance at the moment with t and t − 1, and k denotes the weight of retaining the information of the previous moment, respectively. In this paper, the value of k is set to 0.6. The flow of the algorithm for combined normalization layer is shown in Algorithm 1.
Algorithm 1: Pseudocode for combinatorial normalization algorithm.
Input: data :   R = { x 1 , , x m } , Interval :   a , b ,   Forgetting   weight : k ,
    Parameters : α , β , t 1 , t 2 , t 3
Output: { y i = CNLayer α , β ( x i ) }
m i n R x m i n , m a x R x m a x , μ R 1 m i = 1 m x i , σ R 2 1 m i = 1 m ( x i - μ R ) 2 , d R = 10 ^ log 10 | x | m a x
Softmax ( t 1 ) = e t 1 e t 1 + e t 2 + e t 3 , Softmax ( t 2 ) = e t 2 e t 1 + e t 2 + e t 3 , Softmax ( t 3 ) = e t 3 e t 1 + e t 2 + e t 3
r u n n i n g _ m a x t m * r u n n i n g _ m a x t 1 + ( 1 k ) * m a x R
r u n n i n g _ m i n t m * r u n n i n g _ m i n t 1 + ( 1 k ) * m i n R
r u n n i n g _ m e a n t m * r u n n i n g _ m e a n t 1 + ( 1 k ) * μ R
r u n n i n g _ v a r t m * r u n n i n g _ v a r t 1 + ( 1 k ) * σ R 2
r u n n i n g _ d t m * r u n n i n g _ d t 1 + ( 1 k ) * d R
o u t p u t 1 x i r u n n i n g _ m e a n t r u n n i n g _ v a r t + 1 × 10 5
o u t p u t 2 a + ( b - a ) ( x i - r u n n i n g _ m i n t ) r u n n i n g _ m a x t - r u n n i n g _ m i n t
o u t p u t 3 x i r u n n i n g _ d t
o u t p u t = Softmax ( t 1 ) o u t p u t 1 + Softmax ( t 2 ) o u t p u t 2 + Softmax ( t 3 ) o u t p u t 3
y i o u t p u t i α + β C NLayer α , β ( x i )

3.2. Attention Mechanism

In this paper, the scaled dot product attention mechanism [33,34] is used to pay attention to the input features of the combined normalization encoder. By adaptively selecting relevant feature information, highly relevant features are retained, and irrelevant features are ignored, thereby improving the renormalization encoding. The structure of the scaled dot product attention mechanism is shown in Figure 3.
It can be seen that the feature vectors from the combined normalized coder are passed through three different linear layers to obtain the query vector Q, the key vector K, and the value vector V. First, the dot product calculation is performed on Q and K to obtain the similarity matrix of Q and K. Next, the similarity matrix is scaled. Then, the attention weights are obtained by normalizing the values of the similarity matrix using the Softmax function [29]. The purpose of using the Softmax function [29] is to ensure that the sum of the weights is 1. Then, the attention weights and V are computed as a dot product to obtain the final result. The calculation process is as follows:
A t t e n t i o n ( Q , K , V ) = Softmax ( Q K T d ) V
where d denotes the scaling multiplier, Q, K, and V denote the query vector, key vector, and value vector, respectively, Softmax denotes the Softmax function [31], and Attention (Q, K, V) denotes the final result.

3.3. Combined Renormalized Decoder

The combined renormalization decoder consists of an LSTM model and an adaptive combined renormalization layer. Figure 4 shows the schematic structure of the combined renormalization decoder layer. The output features of the attention mechanism first go through a decoder consisting of multiple layers of LSTMs, which decode the features into normalized predicted values. In order to get the actual predicted value, this value needs to be processed using a combined renormalization layer. Corresponding to the normalization calculation, the adaptive merging and renormalization layer includes three renormalization calculations, which are calculated as follows:
x = x ^ * σ 2 + Δ + m e a n
x = ( m a x - m i n ) ( x ^ - a ) b - a + m i n
x = x ^ * | x | m a x
where x represents the data after renormalization, x ^ represents the data without renormalization, and min, max, mean, and σ2 represent the maximum, minimum, mean, and variance value of the input data, respectively, which all share the statistics from the normalization calculation and are updated with different batches of values. a and b, on the other hand, represent the interval set by the renormalization method and Δ represents a fixed smaller positive number.
To combine the results of the three renormalization calculations and improve the overall data processing, the Softmax [29] combining function is also added to the combined renormalization layer to select the results. This function is used as a combining function to calculate three trainable parameters and output the combined weights for the results of the three renormalization calculations. Three trainable parameters can be optimized by error backpropagation to improve the effectiveness of the renormalization combination. The Softmax function [29] for combining is calculated as follows:
H = Softmax ( c 1 ) h 1 + Softmax ( c 2 ) h 2 + Softmax ( c 3 ) h 3
Softmax ( c ) = 1 1 + e c
where c 1 , c 2 , and c 3 denote the three selected trainable parameters, h 1 , h 2 , and h 3 denote the results obtained from the three renormalization calculations, Softmax denotes the Softmax function [29], H denotes the final output, and denotes the matrix multiplication.
Similarly, the combined renormalization layer incorporates the learnable correction parameters λ and ν as the scaling and translation factors, respectively. The expression at the output of the renormalization layer modified by the correction parameter can be expressed as:
O = λ H + ν
where O represents the predicted output of the renormalization layer, H represents the value after the renormalization calculation, λ is the scaling factor, and ν is the translation factor. Finally, the output O is used as the predicted value of the model. The flow of the algorithm for combined renormalization layer is shown in Algorithm 2.
Algorithm 2: Pseudocode for Combined renormalized algorithm.
Input: data :   R ^ = { x ^ 1 , , x ^ m } ,   Interval :   a , b , Forgetting   weight : m ,
Learning   parameters :   λ , ν , c 1 , c 2 , c 3
Output: { y ^ i = CRNLayer λ , ν ( x ^ i ) }
Softmax ( c 1 ) = e c 1 e c 1 + e c 2 + e c 3 , Softmax ( c 2 ) = e c 2 e c 1 + e c 2 + e c 3 , Softmax ( c 3 ) = e c 3 e c 1 + e c 2 + e c 3
o u t p u t 21 x ^ * r u n n i n g _ v a r + 1 × 10 5 + r u n n i n g _ m e a n
o u t p u t 22 ( r u n n i n g _ m a x - r u n n i n g _ m i n ) ( x ^ i - a ) b - a + r u n n i n g _ m i n
o u t p u t 23 x ^ * r u n n i n g _ d
o u t p u t = Softmax ( c 1 ) o u t p u t 21 + Softmax ( c 2 ) o u t p u t 22 + Softmax ( c 3 ) o u t p u t 23
y ^ i o u t p u t λ + ν CRNLayer λ , ν ( x ^ i )

4. Experiment

In this experiment, the change data of pollutant indicators at the water inlet and outlet when treating brewery wastewater was used. Beer is an alcoholic beverage brewed with malt grain, hops, and water as the primary raw materials, through liquid gelatinization and saccharification and then through liquid fermentation [35]. Beer is the fifth largest consumer beverage globally, second only to tea, carbonated beverages, milk, and coffee, with an average consumption of 23 L per person per year [36]. Beer production requires a lot of water; for each cubic meter of beer produced, the water consumed in general is 10–20 m3, of which more than 90% will be discharged into a sewer system, and wastewater is produced at all stages of production [37]. Moreover, beer wastewater has a high concentration of soluble organic pollutants and SS [38], and the COD of the wastewater produced in the production process is high because the most organic matter in the water is made up of sugars, starches, and proteins [39]. The biological methods commonly used for beer wastewater treatment include aerobic sequential batch reactor, cross-flow ultrafiltration membrane anaerobic reactor, and UASB [40]. Beer wastewater produces methane [39], and better wastewater treatment strategies could lead to better economic benefits while protecting the environment.
The concentration of pollutants such as COD, SS, TN, and TP detected in the wastewater treatment process is an essential indicator of wastewater treatment, and whether it meets the national discharge standards is the determining factor for judging the effect of wastewater treatment. Predicting the future treatment effect according to the pollutant concentration index of the input wastewater at a historical time to assist in decision-making is a hot issue in current research. However, due to the multi-factor, complex, and nonlinear characteristics of forecasting tasks, higher requirements are placed on forecasting models’ data processing and modeling capabilities. Therefore, this study uses COD, SS, TN, and TP data before and after brewery wastewater treatment to verify the model’s prediction accuracy.

4.1. Experimental Procedure and Evaluation Index

Based on the data of pollutant concentration indicators in the actual brewery wastewater treatment process, the prediction accuracy of the proposed model and seven classical prediction models, including ANN [41], deep neural network (DNN) [42], LSTM [43], gated recurrent unit (GRU) [44], Attention_LSTM [45], Attention_GRU [46], and Codec [47] are compared.
The predictive model is built on the open-source Tensorflow deep learning framework. In comparative experiments, the hyperparameters of the model need to be set. Specifically, all prediction models were optimized using the Adam hyperparameter optimization algorithm, and the optimized learning rate was set to 0.0001; the batch size of the data input network was set to 10, and the number of iterations per training was 300. To avoid the influence of random errors of the model on the prediction results, all comparative experiments were repeated ten times independently, and the average value was taken as the final result.
In this paper, four evaluation indicators are used to evaluate the experimental results: root mean square error (RMSE) [48], mean absolute error (MAE) [49], mean absolute percentage error (MAPE) [50], and Pearson correlation coefficient (R) [51]. All four evaluation indicators can measure the difference between the prediction value given by the model and the actual value and evaluate the model’s performance. The smaller RMSE [48], MAE [49], and MAPE [50] values represent the minor difference between the prediction value given by the model and the actual value. In comparison, the larger R [51] values represent the model’s better-fitting ability.

4.2. Validation Results

The dataset consists of four pollutant concentration indicators of COD, SS, TN, and TP detected during the brewery wastewater treatment. The data set was collected from a wastewater treatment station. About 720 sets were collected from 11 June to 11 July 2022. The data sampling interval was 1 h. Each data set includes four pollutant concentration indicators at the inlet and outlet. The structure of the dataset used is shown in Figure 5.
In the experiment, the CNC model proposed in this paper is compared with other classical prediction models, and the superiority of the CNC model in the prediction of the actual wastewater treatment effect is verified by comparing the experimental results. The comparison models include: ANN [41], DNN [42], LSTM [43], GRU [44], Attention_LSTM [45], Attention_GRU [46], and Codec model [47]. The pollutant concentration index of the water inlet from time t−30 to t was used to predict the pollutant concentration index of the water outlet at time t + 1. The dataset is divided into 90% training set and 10% test set.
The prediction accuracy evaluation indexes of each comparative model are shown in Table 1. Figure 6 compares the predicted and actual values of each model. We can see that the RMSE [48], MAE [49], and MAPE [50] of the CNC model proposed in this paper are reduced by 1.5%, 3.2%, and 0.5%, respectively, and the R [51] indicator is increased by 0.1% compared with the suboptimal Codec model. The comparison results show that the model proposed in this paper has better performance indicators, and the prediction results are closer to the actual situation.
Consistent with how other deep learning models used in engineering are deployed [52,53], the models first need to take a long time to be pre-trained using historical data, which can take hours or even days. The training effect of the model is optimized by continuously adjusting the model’s hyperparameters until the gap between the model’s predicted output during training and the reference value meets the requirements. Then save the trained model parameters for practical application. Through this deployment method, in the practical application of the model, new data are input into the trained model, and it no longer takes a lot of time to perform operations. Therefore, the predicted value can be given within 100 ms, meeting the real-time requirement.

5. Conclusions

The organic and inorganic pollutants in the wastewater produced by factories will not only pollute the soil and water bodies but also endanger human health through the enrichment effect of the food chain. However, due to wastewater treatment’s volatility and nonlinear characteristics, it is not easy to carry out predictive modeling and guide early regulation, which seriously affects treatment efficiency [54].
Considering the prediction of pollutant indicators in brewery wastewater treatment to assist management, a combined normalized codec (CNC) prediction model was proposed. The model is based on a combined normalized codec prediction for multi-factor and strongly nonlinear scenarios prediction tasks. In this model, the multi-factor pollutant index data such as COD and SS are first input into the combined normalization encoder. The data are adaptively processed by combining the advantages of the three normalization methods. The encoder extracts the features of the data. Then, the decoder performs feature decoding after the features are paid attention to by the attention mechanism. Finally, a combined renormalization layer adaptively renormalizes the data and outputs the prediction results. The constructed CNC model was used to predict the four pollutant indicators of COD, SS, TN, and TP in brewery wastewater treatment and compared with the classical prediction model. The proposed model’s RMSE [47], MAE [48], and MAPE [49] indicators were 4.355, 3.113, and 1.007, and the R [50] index reached 0.975, which is better than the comparison model. The experimental results show that the model is more suitable for managing and applying wastewater treatment.
In future work, the model should continue to be improved to ensure prediction accuracy. Meanwhile, the method’s applicability is verified by applying the model to more scenarios.

Author Contributions

Conceptualization, C.-M.X. and J.-S.Z.; formal analysis, J.-L.K.; funding acquisition, X.-B.J.; investigation, Y.-T.B.; methodology, J.-S.Z. and X.-B.J.; resources, C.-M.X.; software, J.-S.Z. and L.-Q.K.; supervision, J.-L.K., Y.-T.B., T.-L.S. and P.C.; validation, X.-B.J.and P.C.; visualization, J.-S.Z. and H.-J.M.; writing—original draft, J.-S.Z. and L.-Q.K.; writing—review and editing, J.-S.Z. and X.-B.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China, No. 62173007, 62006008, 61903009.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Solon, K.; Volcke, E.I.; Spérandio, M.; Van Loosdrecht, M.C. Resource recovery and wastewater treatment modelling. Environ. Sci-Wat. Res. 2019, 5, 631–642. [Google Scholar] [CrossRef]
  2. Meena, R.S.; Kumar, S.; Datta, R.; Lal, R.; Vijayakumar, V.; Brtnicky, M.; Marfo, T.D. Impact of agrochemicals on soil microbiota and management: A review. Land 2020, 9, 34. [Google Scholar] [CrossRef] [Green Version]
  3. Chen, W.; Yu, T.; Xu, D.; Li, W.; Pan, C.; Li, Y. Performance of DOuble Circulation Anaerobic Sludge bed reactor: Biomass self-balance. Bioresour. Technol. 2021, 320, 124407. [Google Scholar] [CrossRef]
  4. Corbala-Robles, L.; Ronsse, F.; Pieters, J.G.; Volcke, E.I.P. Heat recovery during treatment of highly concentrated wastewater: Economic evaluation and influencing factors. Water Sci. Technol. 2018, 78, 2270–2278. [Google Scholar] [CrossRef] [PubMed]
  5. Zhu, Y.; Li, X.; Du, M.; Liu, Z.; Luo, H.; Zhang, T. Improve bio-activity of anaerobic sludge by low energy ultrasound. Water. Sci. Technol. 2015, 72, 2221–2228. [Google Scholar] [CrossRef] [PubMed]
  6. Do Amaral, K.C.; Aisse, M.M.; Possetti, G.R.C.; Prado, M.R. Use of life cycle assessment to evaluate environmental impacts associated with the management of sludge and biogas. Water Sci. Technol. 2018, 77, 2292–2300. [Google Scholar] [CrossRef]
  7. Cydzik-Kwiatkowska, A.; Zielińska, M. Bacterial communities in full-scale wastewater treatment systems. World J. Microbiol. Biotechnol. 2016, 32, 1–8. [Google Scholar] [CrossRef] [Green Version]
  8. Chan, Y.J.; Chong, M.F.; Law, C.L.; Hassell, D.G. A review on anaerobic–aerobic treatment of industrial and municipal wastewater. Chem. Eng. J. 2019, 155, 1–18. [Google Scholar] [CrossRef]
  9. Yilmaz, G.; Lemaire, R.; Keller, J.; Yuan, Z. Effectiveness of an alternating aerobic, anoxic/anaerobic strategy for maintaining biomass activity of BNR sludge during long-term starvation. Water. Res. 2007, 41, 2590–2598. [Google Scholar] [CrossRef]
  10. Acharya, N.; Kumar, V.; Gupta, V.; Thakur, C.; Chaudhari, P.K. Aerobic sequential batch reactor for domestic sewage treatment: Parametric optimization and kinetics studies. Int. J. Chem. React. Eng. 2021, 20, 609–617. [Google Scholar] [CrossRef]
  11. Ministry of Ecology and Environment of the People’s Republic of China. GB19821-2005; Discharge Standard of Pollutants for Beer industry. China Standard Press: Beijing, China, 2005.
  12. Ly, Q.V.; Truong, V.H.; Ji, B.; Nguyen, X.C.; Cho, K.H.; Ngo, H.H.; Zhang, Z. Exploring potential machine learning application based on big data for prediction of wastewater quality from different full-scale wastewater treatment plants. Sci. Total Environ. 2022, 832, 154930. [Google Scholar] [CrossRef] [PubMed]
  13. Kim, J.; Yoon, H.; Kim, M.S. Tweaking deep neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5715–5728. [Google Scholar] [CrossRef] [PubMed]
  14. Dargan, S.; Kumar, M.; Ayyagari, M.R.; Kumar, G. A survey of deep learning and its applications: A new paradigm to machine learning. Arch. Comput. 2020, 27, 1071–1092. [Google Scholar] [CrossRef]
  15. Li, X.; Yi, X.; Liu, Z.; Liu, H.; Chen, T.; Niu, G.; Ying, G. Application of novel hybrid deep leaning model for cleaner production in a paper industrial wastewater treatment system. J. Clean. Prod. 2021, 294, 126343. [Google Scholar] [CrossRef]
  16. Arismendy, L.; Cárdenas, C.; Gómez, D.; Maturana, A.; Mejía, R.; Quintero, M.C.G. Intelligent system for the predictive analysis of an industrial wastewater treatment process. Sustainability 2020, 12, 6348. [Google Scholar] [CrossRef]
  17. Hilal, A.M.; Althobaiti, M.M.; Eisa, T.A.E.; Alabdan, R.; Hamza, M.A.; Motwakel, A.; Negm, N. An Intelligent Carbon-Based Prediction of Wastewater Treatment Plants Using Machine Learning Algorithms. Adsorpt. Sci. Technol. 2022, 8448489. [Google Scholar] [CrossRef]
  18. Liu, G.; He, T.; Liu, Y.; Chen, Z.; Li, L.; Huang, Q.; Liu, J. Study on the purification effect of aeration-enhanced horizontal subsurface-flow constructed wetland on polluted urban river water. Environ. Sci. Pollut. R. 2019, 26, 12867–12880. [Google Scholar] [CrossRef]
  19. Han, H.; Liu, Z.; Hou, Y.; Qiao, J. Data-driven multi-objective predictive control for wastewater treatment process. IEEE Trans. Industr. Inform. 2019, 16, 2767–2775. [Google Scholar] [CrossRef]
  20. Farhi, N.; Kohen, E.; Mamane, H.; Shavitt, Y. Prediction of wastewater treatment quality using LSTM neural network. Environ. Technol. Innov. 2021, 23, 101632. [Google Scholar] [CrossRef]
  21. Wan, X.; Li, X.; Wang, X.; Yi, X.; Zhao, Y.; He, X.; Huang, M. Water quality prediction model using Gaussian process regression based on deep learning for carbon neutrality in papermaking wastewater treatment system. Environ. Res. 2022, 211, 112942. [Google Scholar] [CrossRef]
  22. Jain, S.; Shukla, S.; Wadhvani, R. Dynamic selection of normalization techniques using data complexity measures. Expert Syst. Appl. 2018, 106, 252–262. [Google Scholar] [CrossRef]
  23. Alexandropoulos, S.A.N.; Kotsiantis, S.B.; Vrahatis, M.N. Data preprocessing in predictive data mining. Knowl. Eng. Rev. 2019, 34, 1–33. [Google Scholar] [CrossRef] [Green Version]
  24. Passalis, N.; Tefas, A.; Kanniainen, J.; Gabbouj, M.; Iosifidis, A. Deep adaptive input normalization for time series forecasting. IEEE Trans. Neural. Netw. Learn. Syst. 2019, 31, 3760–3765. [Google Scholar] [CrossRef] [Green Version]
  25. Jin, X.; Zhang, J.; Kong, J.; Su, T.; Bai, Y. A reversible automatic selection normalization (RASN) deep network for predicting in the smart agriculture system. Agronomy 2022, 12, 591. [Google Scholar] [CrossRef]
  26. Wang, Q.; Hao, Y. ALSTM: An attention-based long short-term memory framework for knowledge base reasoning. Neurocomputing 2020, 399, 342–351. [Google Scholar] [CrossRef]
  27. Surucu, M.; Isler, Y.; Perc, M.; Kara, R. Convolutional neural networks predict the onset of paroxysmal atrial fibrillation: Theory and applications. Chaos 2021, 31, 113119. [Google Scholar] [CrossRef] [PubMed]
  28. Singh, D.; Singh, B. Feature wise normalization: An effective way of normalizing data. Pattern. Recognit. 2022, 122, 108307. [Google Scholar] [CrossRef]
  29. Totaro, S.; Hussain, A.; Scardapane, S. A non-parametric softmax for improving neural attention in time-series forecasting. Neurocomputing 2020, 381, 177–185. [Google Scholar] [CrossRef]
  30. Van Houdt, G.; Mosquera, C.; Napoles, G. A review on the long short-term memory model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
  31. Kim, T.; Kim, J.; Tae, Y.; Park, C.; Choi, J.H.; Choo, J. Reversible instance normalization for accurate time-series forecasting against distribution shift. ICLR 2021, 1–25. [Google Scholar]
  32. Charles, C.H. Forecasting seasonals and trends by exponentially weighted moving averages. Int. J. Forecast. 2004, 20, 5–10. [Google Scholar]
  33. Rad, A.C.; Lemnaru, C.; Munteanu, A. A Comparative Analysis between Efficient Attention Mechanisms for Traffic Forecasting without Structural Priors. Sensors 2022, 22, 7457. [Google Scholar] [CrossRef]
  34. Chen, Y.; Xia, S.; Zhao, J.; Zhou, Y.; Niu, Q.; Yao, R.; Liu, D. ResT-ReID: Transformer block-based residual learning for person re-identification. Pattern. Recogn. Lett. 2022, 157, 90–96. [Google Scholar] [CrossRef]
  35. Karlovic, A.; Juric, A.; Coric, N.; Habschied, K.; Krstanovic, V.; Mastanjevic, K. By-products in the malting and brewing industries—re-usage possibilities. Fermentation 2020, 6, 82. [Google Scholar] [CrossRef]
  36. Fillaudeau, L.; Blanpain-Avet, P.; Daufin, G. Water, wastewater and waste management in brewing industries. J. Clean. Prod. 2006, 14, 463–471. [Google Scholar] [CrossRef]
  37. Mielcarek, A.; Janczukowicz, W.; Ostrowska, K.; Jóźwiak, T.; Kłodowska, I.; Rodziewicz, J. Biodegradability evaluation of wastewaters from malt and beer production. J. Inst. Brew. 2013, 119, 242–250. [Google Scholar] [CrossRef]
  38. Shao, X.; Peng, D.; Teng, Z.; Ju, X. Treatment of brewery wastewater using anaerobic sequencing batch reactor (ASBR). Bioresour. Technol. 2008, 99, 3182–3186. [Google Scholar] [CrossRef]
  39. Sangeetha, T.; Guo, Z.; Liu, W.; Cui, M.; Yang, C.; Wang, L. Cathode material as an influencing factor on beer wastewater treatment and methane production in a novel integrated upflow microbial electrolysis cell (Upflow-MEC). Int. J. Hydrogen. Energ. 2016, 41, 2189–2196. [Google Scholar] [CrossRef] [Green Version]
  40. Feng, Y.; Wang, X.; Logan, B.E.; Lee, H. Brewery wastewater treatment using air-cathode microbial fuel cells. Appl. Microbiol. Biotechnol. 2008, 78, 873–880. [Google Scholar] [CrossRef]
  41. Kujawa, S.; Niedbała, G. Artificial neural networks in agriculture. Agriculture 2021, 11, 497. [Google Scholar] [CrossRef]
  42. Poznyak, A.; Chairez, I.; Poznyak, T. A survey on artificial neural networks application for identification and control in environmental engineering: Biological and chemical systems with uncertain models. Annu. Rev. Control. 2019, 48, 250–272. [Google Scholar] [CrossRef]
  43. Jin, X.B.; Gong, W.T.; Kong, J.L.; Bai, Y.T.; Su, T.L. A variational bayesian deep network with data self-screening layer for massive time-series data forecasting. Entropy 2022, 24, 335. [Google Scholar] [CrossRef] [PubMed]
  44. Oliveira, P.; Fernandes, B.; Analide, C.; Novais, P. Forecasting energy consumption of wastewater treatment plants with a transfer learning approach for sustainable cities. Electronics 2021, 10, 1149. [Google Scholar] [CrossRef]
  45. Abbasimehr, H.; Paki, R. Improving time series forecasting using LSTM and attention models. J. Amb. Intel. Hum. Comp. 2022, 13, 673–691. [Google Scholar] [CrossRef]
  46. Jung, S.; Moon, J.; Park, S.; Hwang, E. An attention-based multilayer GRU model for multistep-ahead short-term load forecasting. Sensors 2021, 21, 1639. [Google Scholar] [CrossRef] [PubMed]
  47. Dorado Rueda, F.; Durán Suárez, J.; del Real Torres, A. Short-term load forecasting using encoder-decoder wavenet: Application to the french grid. Energies 2021, 14, 2524. [Google Scholar] [CrossRef]
  48. Mentaschi, L.; Besio, G.; Cassola, F.; Mazzino, A. Problems in RMSE-based wave model validations. Ocean. Model. 2013, 72, 53–58. [Google Scholar] [CrossRef]
  49. Fan, B.; Xing, X. Intelligent prediction method of building energy consumption based on deep learning. Sci. Program. Neth. 2021, 2021, 3323316. [Google Scholar] [CrossRef]
  50. Alghamdi, H.A. A time series forecasting of global horizontal irradiance on geographical data of Najran Saudi Arabia. Energies 2022, 15, 928. [Google Scholar] [CrossRef]
  51. Kim, N.; Park, S.; Lee, J.; Choi, J.K. Load profile extraction by mean-shift clustering with sample Pearson correlation coefficient distance. Energies 2018, 11, 2397. [Google Scholar] [CrossRef] [Green Version]
  52. Pruneski, J.A.; Williams, R.J.; Nwachukwu, B.U.; Ramkumar, P.N.; Kiapour, A.M.; Martin, R.K.; Pareek, A. The development and deployment of machine learning models. Knee Surg. Sports Traumatol. Arthrosc. 2022. online ahead of print. [Google Scholar] [CrossRef] [PubMed]
  53. Jean, C.; Jankovic, M.; Stal-Le Cardinal, J.; Bocquet, J.C. Predictive modelling of telehealth system deployment. J. Simul. 2015, 9, 182–194. [Google Scholar] [CrossRef]
  54. Jin, X.B.; Gong, W.T.; Kong, J.L.; Bai, Y.T.; Su, T.L. PFVAE: A planar flow-based variational auto-encoder prediction model for time series data. Mathematics 2022, 10, 610. [Google Scholar] [CrossRef]
Figure 1. Model structure of the combined normalization codec (CNC).
Figure 1. Model structure of the combined normalization codec (CNC).
Mathematics 10 04283 g001
Figure 2. Flow chart of the combined normalized encoder with three normalization methods.
Figure 2. Flow chart of the combined normalized encoder with three normalization methods.
Mathematics 10 04283 g002
Figure 3. Process description of the attention mechanism to pay attention to the characteristics of the data features.
Figure 3. Process description of the attention mechanism to pay attention to the characteristics of the data features.
Mathematics 10 04283 g003
Figure 4. Flow chart of the combined renormalization decoder with the inverse transform for z-score, Interval, and Max.
Figure 4. Flow chart of the combined renormalization decoder with the inverse transform for z-score, Interval, and Max.
Mathematics 10 04283 g004
Figure 5. Data comparison of water inlet and outlet. (a) chemical oxygen demand, suspended solids, total nitrogen, and total phosphorus detected at the water inlet. (b) Chemical oxygen demand, suspended solids, total nitrogen, and total phosphorus detected at the outlet.
Figure 5. Data comparison of water inlet and outlet. (a) chemical oxygen demand, suspended solids, total nitrogen, and total phosphorus detected at the water inlet. (b) Chemical oxygen demand, suspended solids, total nitrogen, and total phosphorus detected at the outlet.
Mathematics 10 04283 g005
Figure 6. Comparison of predicted and actual values given by the model, based on four pollutant indicators. (a) Chemical oxygen demand, (b) suspended solids, (c) total nitrogen, (d) total phosphorus. The last orange-red band is the actual ground-truth value, and the prediction results of all methods are compared using dashed lines. It can be seen that the red band (the method proposed in this paper) is the closest to the actual value.
Figure 6. Comparison of predicted and actual values given by the model, based on four pollutant indicators. (a) Chemical oxygen demand, (b) suspended solids, (c) total nitrogen, (d) total phosphorus. The last orange-red band is the actual ground-truth value, and the prediction results of all methods are compared using dashed lines. It can be seen that the red band (the method proposed in this paper) is the closest to the actual value.
Mathematics 10 04283 g006
Table 1. Comparison of evaluation indexes based on prediction results of actual brewery wastewater pollutant index data.
Table 1. Comparison of evaluation indexes based on prediction results of actual brewery wastewater pollutant index data.
ModelRMSE [48]MAE [49]MAPE [50]R [51]
ANN [41]4.56333.32211.00590.9722
DNN [42]4.55253.31940.99830.9723
LSTM [43]4.47863.25711.02150.9733
GRU [44]4.48883.28081.01350.9731
Attention_LSTM [45]4.44783.23301.00860.9735
Attention_GRU [46]4.42213.21711.01210.9738
Codec [47]4.42213.21711.01210.9738
The proposed CNC4.35473.11261.00710.9749
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xu, C.-M.; Zhang, J.-S.; Kong, L.-Q.; Jin, X.-B.; Kong, J.-L.; Bai, Y.-T.; Su, T.-L.; Ma, H.-J.; Chakrabarti, P. Prediction Model of Wastewater Pollutant Indicators Based on Combined Normalized Codec. Mathematics 2022, 10, 4283. https://doi.org/10.3390/math10224283

AMA Style

Xu C-M, Zhang J-S, Kong L-Q, Jin X-B, Kong J-L, Bai Y-T, Su T-L, Ma H-J, Chakrabarti P. Prediction Model of Wastewater Pollutant Indicators Based on Combined Normalized Codec. Mathematics. 2022; 10(22):4283. https://doi.org/10.3390/math10224283

Chicago/Turabian Style

Xu, Chun-Ming, Jia-Shuai Zhang, Ling-Qiang Kong, Xue-Bo Jin, Jian-Lei Kong, Yu-Ting Bai, Ting-Li Su, Hui-Jun Ma, and Prasun Chakrabarti. 2022. "Prediction Model of Wastewater Pollutant Indicators Based on Combined Normalized Codec" Mathematics 10, no. 22: 4283. https://doi.org/10.3390/math10224283

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop