Fault Detection of Wastewater Treatment Plants Based on an Improved Kernel Extreme Learning Machine Method

: In order to ensure the stable operation, improve efﬁciency, and enhance sustainability of wastewater treatment systems, this paper investigates the fault detection problem in wastewater treatment process based on an improved kernel extreme learning machine method. Firstly, a kernel extreme learning machine (KELM) model optimized by an improved mutation bald eagle search (IMBES) optimizer is proposed to generate point predictions of efﬂuent quality parameters. Then, based on the point prediction results, the conﬁdence interval of efﬂuent quality parameters is calculated using kernel density estimation (KDE) method. This interval represents the bounds of system uncertainty and unknown disturbance at normal conditions and can be treated as the threshold for fault diagnosis. Finally, the effectiveness of the proposed method is illustrated by two datasets obtained from the BSM1 wastewater simulation platform and an actual water platform. Experimental results show that compared with other methods such as CNN, LSTM, and IBES-LSSVM, this method has a signiﬁcant improvement in prediction accuracy, and at the same conﬁdence level, it ensures fault detection rate while generating smaller conﬁdence intervals.


Introduction
In recent decades, water industry has undergone significant changes due to the rapid development of new-generation information technologies such as the Internet of Things (IoT), cloud computing and big data. Building safe and efficient water systems and developing smart water technologies has become an inevitable trend. The use of IoT technology in smart water systems has gained much attention and provides strong technical support for intelligent operation and management. Moreover, in response to increasingly prominent environmental problems, countries have developed strict discharge standards for wastewater treatment plants (WWTPs). As a result, most WWTPs are equipped with many redundant hardware facilities to ensure plant safety. Thus, timely monitoring and fault diagnosis of effluent water quality have become crucial issues, with significant research outcomes [1][2][3].
In general, current fault diagnosis methods be broadly categorized into three categories: model-based, data-driven and knowledge-based methods [4,5]. One of the most representative mechanism models developed by the International Association for Research and Control of Water Pollution since 1987 is the activated sludge model, which has been beneficial in understanding the mechanism of wastewater treatment processes and has provided good guidance for the design process of wastewater treatment. However, this model has some certain limitations, such as difficult parameter estimation and poor prediction performance. On the other hand, knowledge-based methods heavily rely on the expertise and knowledge of the experts, which somewhat hinders their practical applications [6].
Data-driven fault diagnosis methods for WWTPs have gained significant attention in recent years due to the advancements in data-driven technology [7], which mainly includs statistical analysis, signal analysis and machine learning methods. Among them, statistical analysis methods leverage stochastic processes and statistical reasoning and other mathematical theories to build prediction models, including Wiener processes, Gaussian processes, inverse Gaussian processes, and others. Currently, multivariate statistical process monitoring approaches such as principal component analysis (PCA), partial least squares (PLS), and their extensions, such as adaptive PCA, dynamic PCA, KPCA, KPLS and others, have been applied to fault diagnosis for WWTPs [1]. The fault detection method based on signal analysis mainly uses the theory of wavelet transform and Fourier transform to analyze the characteristics of vibration signals and pressure signals. It is used to diagnose pump station faults, pipeline blockages, fan failures, and other problems in the wastewater treatment process.
Since the concentration of effluent water quality parameters is a crucial factor in measuring water quality in wastewater treatment processes. Note that machine learning is a powerful tool for predicting effluent indicators in wastewater treatment due to its ability for strong nonlinear fitting. By utilizing machine learning to accurately forecast these indicators, it becomes possible to detect and identify potential malfunctions in the treatment process with greater precision and speed. Hence, machine learning based fault detection has been an hot topic and attracted significant attention [8,9]. Generally, machine learning techniques can be categorized into shallow machine learning and deep learning, depending on the amount of data needed for the training process. Deep learning techniques heavily rely on data and tend to perform better when working with large datasets, whereas shallow machine learning is more appropriate for predicting effluent water quality concentrations when only a small amount of data is available.
Among the existing techniques, extreme learning machine is relatively easy to implement, which makes it is a popular choice for a wide range of prediction tasks. To account for the nonlinearity and time variability of WWTPs, a dynamic KELM model is proposed in [10] to predict the key quality indices of effluent chemical oxygen demand (COD). In paper [11], an ELM black box model combined with PCA was developed to predict the effluent BOD, effluent COD and other effluent indicators of WWTP. Ref. [12] reviews the metabolic characteristics, microbial communities, and process applications of different types of ammonia-oxidizing microorganisms in wastewater treatment systems, and discusses the future development of nitrogen removal processes using these microorganisms. Ref. [13] validated that deep learning algorithms, especially LSTM-based models, outperformed the ELM model in terms of accuracy, robustness against overfitting, and capturing extreme hydrologic events such as no-flow events and extreme floods. In [14], an algorithm combining ELM and KPCA is proposed for process monitoring of water treatment plant. Ref. [15] proposes a KPCA-ELM model for forecasting the inlet COD and BOD concentration in wastewater treatment, which can be used to control parameter adjustment in the system for better control of WWTP, and outperforms other contrast approaches in terms of forecasting capacity and accuracy. These studies demonstrate the effectiveness of ELM and its extensions, especially KELM in predicting effluent water quality parameters in WWTPs.
The kernel parameters of KELM have a significant impact on prediction performance, which makes it is critical for selecting appropriate values. Swarm intelligence optimization algorithms have shown great potential in improving the performance of machine learning methods by finding the optimal solution through collaborative search mechanisms. In recent years, many researchers have combined swarm intelligence optimization algorithms with machine learning methods to solve various prediction and optimization problems [16]. Some commonly used methods include: particle swarm optimization (PSO) [17], artificial bee colony optimization (ABC) [18], gray wolf algorithm (GWO) [19], bald eagle search (BES) [20] and many other swarm intelligence optimization algorithms have been applied to wastewater treatment process successfully. These studies demonstrate the effectiveness and versatility of using swarm intelligence optimization algorithms in combination with machine learning methods.
Due to the unavoidable influence of unknown parameters and measurement noise in the WWTPs, interval estimation based on the point estimation result has been an important technique to evaluate the unknown disturbance. In [21], the VBELM model demonstrated superior performance for multi-scale groundwater level forecasting, with lower uncertainty and higher observed values within the confidence interval, compared to the single ELM model and the VELM model. Non-parametric methods such as the kernel density estimation (KDE) method can be used to estimate the distribution of prediction errors, which can be used to generate prediction intervals. The advantage of the KDE method is that it does not require prior knowledge about the data distribution, and it can capture the data distribution characteristics from the sample data itself [22]. Therefore, the use of the KDE method in fault diagnosis can reduce the interference of other factors and improve the reliability of the diagnosis. Ref. [23] proposes a method to track the time-varying statistical error of KPCA based on generalized likelihood ratio statistics (GLR) and approximate the upper control limit by KDE to realize the operation of KPCA-based variational system, and the results show that the applicability of SSGLR system based on KPCA-KDE is superior to kernel locality preserving projection(KLPP) in fault identification. Ref. [24] proposes interpolated IKDE to reduce computational cost and improve processing time. However, to the best knowledge of the authors, little attention has been paid on KDE based interval generation for WWTPs, which motives this work.
In this paper, in order to deal with the problem of fault detection in sewage treatment plant, we need to collect some important variables online, analyze the effluent variables, find out the relationship between easily measured variables, establish a prediction model, and achieve the purpose of indirect measurement.
The point prediction of effluent indices can be obtained based on many existing regression approach. However, it is important to note that the point prediction method by itself may not be sufficient to account for the uncertainties that are inherent in system operations. To obtain a numerical estimation and evaluate its reliability, it is crucial to calculate prediction intervals in practical applications.
Interval prediction methods utilize statistical properties of prediction errors to estimate upper and lower bounds, which can be used as thresholds for fault detection decisionmaking. In normal conditions, the effluent variable values should fall within the estimation bounds. Deviation from these bounds indicates the occurrence of faults. Hence, the main objectiveness of this work it to generate the interval prediction result and then achieve fault detection based on it.
This paper focuses on real-time monitoring of wastewater quality variables in WWTPs. A method of IMBES-KELM-KDE is proposed to ensure the effective estimation of sensors and realize reliable monitoring and fault diagnosis. Aiming at the on-line prediction and fault detection of wastewater from WWTPs in the framework of intelligent data-driven, the main contributions of this work are as follows: • Since BES algorithm is easy to fall into local optimum, mutation strategy is introduced to optimize the traditional BES algorithm. In the initialization, tent chaotic strategy is used to optimize the initial position of prey. Furthermore, the mutation strategy is used in the three stages of BES algorithm to update the positions. Compared with existing swarm intelligence optimization models (such as PSO, WOA, IBES, GWO, SSA), the proposed method achieves higher prediction accuracy. Then, the proposed IMBES optimization algorithm is used to solve the optimal parameters of KELM. • In order to estimate the uncertainty of model prediction results and make better decisions, the KDE method is used to generate the interval prediction boundary of effluent quality after the point prediction results are obtained. • According to the fault factors in different environments, the proposed model is used for fault diagnosis of wastewater treatment process. Compared with the previous methods, it is proved that the generated interval is more sensitive and can accurately diagnose the occurrence of faults.
The structure of this paper is as follows: Section 2 describes the proposed IMBES-KELM-KDE model to generate the interval prediction. In Section 3, simulation examples are depicted and the feasibility of this method is verified in three different fault environments. Section 4 draws conclusions of this paper.

Materials and Methods
In this paper, an IMBES-KELM-KDE method is proposed to fault diagnosis for wastewater treatment plant, the schematic of the proposed method is shown in Figure 1.

KELM Model
KELM is an improved model of ELM and has good non-linear fitting ability and fast learning speed [25]. The number of KELM's hidden layer nodes does not need to be given, and the unknown nonlinear feature mapping of hidden layer is represented by kernel function, which can enhance the generalization ability and stability of ELM.
To predict the effluent water quality index, the wastewater treatment sample data should be mapped to higher dimensions through non-linear changes, and linear functions are used for fitting in this high-dimensional feature space. Suppose the structure of the ELM model is with n input variables, q hidden layer neurons and m output variables, then we have: where x mean the input, y l means the l-th output variables. h(·) represents activation function. w i denotes the weight matrices from input nodes to hidden nodes and it is set randomly and b i is the bias variable. β j is the matrix between the hidden nodes and the output nodes. Denote the hidden layer output matrix as (2) and the output weight matrix as β ∈ R q×1 where y = [y 1 , · · · , y j , · · · , y n ] T represents the data affected by water outlet failure. Since the input weight w and hidden neuron threshold b are randomly generated in ELM, the effect of the method is unstable. In order to improve the accuracy and stability of the method, the kernel parameter is introduced in the algorithm to replace the random feature mapping of ELM, and the KELM method is formed. The parameter 1/C is added to the unit diagonal matrix HH T so that its characteristic root is not zero, which further makes KELM more stable and has better generalization. The model of KELM is as follows: where K represents kernel function which is usually set to RBF kernel function, C represents regularization coefficient, σ represents kernel parameter. It is difficult to select the optimal C and σ, which has a great impact on the prediction performance of KELM. In this paper, a hybrid IMBES-KELM model is proposed, in which the parameters of C and σ can be optimized by IMBES method.

IMBES Optimizer
The bald eagle search (BES) optimisation algorithm was proposed by Alsattar et al. in 2020 [26]. It mainly simulated the hunting strategy and intelligent social behavior of condor when looking for prey. The algorithm can be divided into three parts, namely select stage, search stage and swoop stage ( Figure 2).

Tent Chaos Strategy
In this paper, to improve the quality of the initial solution, a tent chaos strategy is utilized to optimize the initial position of prey, and a linear decline method is employed to improve the control parameters of vulture iterative update position. This approach helps to identify optimal model parameters and improve the fitting quality. The tent chaos mapping function is described as follows: where P i is the location of the i-th vulture and λ belongs to [0, 1].

Select Stage
Bald eagles select search areas randomly and assess the number of prey to determine the best locations. During this stage, the bald eagle's position P i,new is updated through a random search of prior information multiplied by α. The behavior of the bald eagle can be described mathematically as follows: where C 1 is a parameter controlling the position change, and rand is a random number between (0, 1). P best is the current optimal location. P mean is the average distribution location of vultures after the previous search.

Search Stage
Bald eagles use a spiral search pattern to efficiently locate prey within their selected search space. The mathematical model used to describe this spiral flight behavior involves updating the position using the polarity equation. Specifically, the new position P i,new of the bald eagle is determined by adding the current position P i to the product of the spiral factor r(i) and the sine and cosine functions of the polar angle θ(i). The polar angle is calculated based on the iteration step and the maximum number of iterations, which ensures that the spiral search pattern covers the entire search space. The resulting equation is as follows: x r (i) = r(i) · sin[θ(i)] x(i) = x r (i) max(|x r |) (14) y(i) = y r (i) max(|y r |) (15) where θ(i) and r(i) are the polar angles and diameters of the spiral equation, respectively. ω and C 2 are the parameters controlling the spiral trajectory. x(i) and y(i) are the bald eagle positions in polar coordinates, and the values are (−1, 1). Bald eagle position updates are as follows: where P i+1 is the next update position of the i-th bald eagle.

Swoop Stage
The bald eagle's final stage of hunting involves quickly swooping down on the target prey from the best spot in the search space, while the rest of the population simultaneously moves to the best spot and attacks the prey. The motion state is still described by the polar equation, given as follows: x 1 (i) = x r (i) max(|x r |) (21) y 1 (i) = y r (i) max(|y r |) (22) the position update formula of bald eagle in dive is: where C 3 and C 4 are the intensity of the bald eagle moving towards the optimal central position, and their values are (1, 2).

Mutation Strategy
To address the potential issue of the BES algorithm getting trapped in local optima, a mutation operation [27] was introduced into the algorithm. When vultures were close to the historical optimal position for targeting prey, prey positions meeting certain mutation conditions were mutated to increase population diversity and improve the algorithm's global optimization capability. The mutation operation is described as follows: where P i,mutation is the bald eagle position after the mutation, p c represents the variation factor, and the value range is the smallest definition domain among all prey positions. p m represents the variation rate. P i,max represents the maximum set search range, P i,min represents the minimum set search range, and P i,max − P i,min represents the domain of prey location.
The position of prey is subject to variation rate p m . In the early iteration period, the BES algorithm mainly plays its own characteristics and adopts a small variation rate. With the increase of iteration times, the diversity of the algorithm becomes worse. The formula for calculating the variation rate is: where σ represents the initial variation rate set, i represents the number of current iterations and N represents the maximum number of iterations.

Interval Generated by KED Method
In this section, the kernel density estimation (KDE) method is used to estimate the prediction error probability density curve of effluent data [28]. KDE is a non-parametric estimation method that directly estimates the density of a random variable without distribution assumptions. This universal approach produces a fitted probability density function that is closer to the real information.
For a group of wastewater treatment effluent prediction error data E = {e 1 , e 2 , · · · , e n }, where n is the number of effluent prediction error samples. Based on the principle of KDE, the probability density function of effluent prediction error is obtained as: where e j is the j-th prediction error sample, f (e, γ) represents the KDE of prediction errors, γ means the bandwidth that determines the interval division of the error data distribution, K(·) denotes the kernel function. In this work, the effluent prediction error e and the bandwidth γ are assumed to be independent. The choice of kernel function can have a significant impact on the resulting KDE estimate. In this paper, the Gaussian kernel function is selected which can be expressed as Hence, the non-parametric KDE is thus obtained as The probability interval estimation of effluent prediction error can be divided into three steps: (1) calculating the prediction error of WWTPs based on IMBES-KELM model; (2) obtaining the quantile at the confidence of (1 − µ)% based on KED method; (3) generating the effluent quality parameter interval prediction values.
Given the confidence level of (1 − µ)%, the upper and lower quantile spot F µ/2 and F 1−µ/2 can be obtained. Then, the effluent data interval τ = [L µ , U µ ] at the confidence level of (1 − µ)% is obtained as whereŷ represents the point prediction value of effluent data, L µ and U µ are the lower and upper boundaries of the predicted interval at the confidence level of (1 − µ)%, respectively. Next, to evaluate the performance of the generated interval, the following evaluation criteria are analysed [29].

Prediction Interval Coverage Probability (PICP)
: the ratio of the real value to the upper and lower bounds of the prediction interval.
where N is the number of prediction points. If the true value is within the [Y, Y] range, π k is 1. Otherwise, π k is 0. If all true values are included in the prediction interval, PICP = 100%. Hence, a larger PICP value indicates better monitoring performance. Prediction Interval normalized averaged width (PINAW): measures the width of the generated interval. When PICP is consistent, a smaller PINAW corresponds to a narrower generated interval.
where R is the range of true values. All interval score (AIS): is a practical tool to provide comprehensive consideration of coverage and interval width. Interval score S k of the k-th forecast interval is defined as follows: when the target is not within the coverage range, certain penalties will be given. The overall AIS assessment is as follows: In addition, a higher AIS value indicates a better quality of the prediction interval. In practical applications, it is often desirable to obtain a narrow prediction interval width while maintaining a high prediction probability. However, the probability of the prediction interval range and the interval width may conflict with each other.
Coverage Width-based Criterion (CWC): a comprehensive index of the prediction interval range and the interval width.
where τ and µ are constants, and where PINC is the set confidence level. When a fault occurs, the true value exists outside the predicted interval. In this paper, CWC is selected as the fitness function of the whole algorithm. In addition, the overall pseudocode of the proposed algorithm is as in Algorithm 1.

Begin
Initializes point P i by tent chaos strategy according to Equation (8); Set the KELM parameters C and σ, point prediction by Equations (4)-(6); Kernel density estimation interval by Equations (28) and (29); Calculate the fitness of the search agent by Equations (30)-(35), obtain the optimal solution P best ; t = 1; while (t t max ) do for i = 1 to N Update the search areas position P i by Equation (8).
Mutation strategy change the position P i by Equations (24) and (25).

End for
Evaluate the current population. Update the optimal solution P best . for i = 1 to N Update the bald eagle position P i by Equations (9)- (16). Mutation strategy change the position P i by Equations (24) and (25).

End for
Compare the fitness function. Update the optimal solution P best . for i = 1 to N Update the bald eagle dive position P i by Equations (17)- (23). Mutation strategy change the position P i by Equations (24) and (25).

End for
Compare the fitness function. Update the optimal solution P best . t = t + 1 end while Acquire optimal parameters form the optimal solution P best . Return the KELM optimal parameters C and σ. Return the Kernel density estimation optimal interval.

Simulation Results
In this section, to validate the effectiveness of the proposed IMBES algorithm, it is compared with commonly used swarm intelligence optimization algorithms (such as PSO, WOA, GWO, SSA, and IBES) on several commonly used functions listed in Table 1. In this study, MATLAB (version 2020a) is used for data preprocessing and machine learning model building. The software used in this study is widely used in data analysis and machine learning applications, showing excellent performance and versatility in various research fields.

Function
Range Parameters [0, 10] dim = 4, popsize = 100, iteration = 300 The final statistical results are shown in Table 2. Moreover, the iterative process is depicted in Figures 3-7. From the results, it can be seen that the IMBES algorithm proposed in our work exhibits a superior convergence rate and also provides competitive results on benchmark functions compared with other existing algorithms.      In the sequel, to verify the feasibility and superiority of the proposed method in fault detection of WWTPs, it is tested on two sets of data that are collected from the BSM1 wastewater treatment simulation platform and an actual wastewater treatment simulation platform, separately.
Sludge bulking and toxic shock are two common process faults in WWTPs. The value of the BSM1 model may indicate different levels of sludge expansion failure depending on the adjustment of µH. In addition, sensor fault is a very common equipment failure. In this paper, sludge swelling fault, toxic shock fault, and sensor fault were established separately. Detailed descriptions of these faults are presented in Table 3. First, 1344 sets of data are collected on BSM1 from day 1 to day 14. They are divided into two main parts. The first part comprises 672 groups and is responsible for constructing the model. The second part also consists of 672 groups and is used for prediction based on the constructed model. The effluent auxiliary variables of BSM1 simulation platform used in this work are presented in Table 4. Similarity, 365 sets of data are collected on the actual simulation platform, which are divided into two parts. The first part included 305 groups and the second part included 60 groups. In addition, the variables are presented in Table 5.
Tables 6 and 7 depict the interval prediction index of sludge expansion fault and toxic impact fault, respectively. The results illustrate that the proposed IMBES-LSSVM-KED model is more effective in terms of prediction accuracy than existing models, such as CNN, LSTM, SSA-KELM, IBES-KELM, and IMBES-KELM. Under the same confidence level, all models are close to the set confidence level value according to the PICP index, which indicates that the experiment is correct in terms of method. By comparing the PINAW index, it can be seen that the IMBES-KELM model generates a smaller confidence interval. Compared with the comprehensive evaluation index CWC, it can be seen that the proposed model in this paper has good competitiveness.  The interval prediction results are shown in Figures 8 and 9, wherein the solid and dashed lines represent the true and predicted values, respectively, besides, the dotted lines represent the boundaries generated by KDE. From Figures 8 and 9 we can see that the curve for the true values falls outside of the interval after the seventh day, indicating that a fault occurs after this time, which is consistent with the fault environment that was designed for the study.  In addition, to evaluate its effectiveness in addressing sensor faults, similar process are considered in a real simulation environment. In this situation, the initialization conditions are set as: iter = 672, n = 30, w max = 10, w min = 0, C 1 = 1.8, C 2 = 1, C 3 = 1.5, C 4 = 1.5, σ = 0.1, γ = 2. In addition, fault diagnosis rate (FDR) and false positive rate (FAR) are selected to evaluate the effectiveness of the proposed method for sensor faults in a real simulation environment.
where FD is the number of times the system indicates a fault when there is none, and TD is the sum of true alarms and false alarms.
where FA is the number of times the system correctly identifies a fault, and TA is the sum of correct diagnoses and missed diagnoses. From the results presented in Table 8, it is evident that the proposed method outperforms existing models such as CNN, LSTM, SSA-KELM, IBES-LSSVM, and IBES-KELM in terms of prediction accuracy. From Figure 10 we can also find that the true value curve lies outside the interval between 305 and 365, which means the fault is detected in these time period. Furthermore, Table 9 shows the comparison between the FAR and FDR, which reveals that the IMBES-KELM-KDE model can detect the occurrence of faults more accurately and its hysteresis is improved compared to the CNN-KDE, LSTM-KDE, SSA-KELM-KDE, IBES-LSSVM-KDE, and IBES-KELM-KDE models. PICP index test shows that all models reach the set confidence level. PINAW and CWC indexes show that the IMBES-KELM method generates more sensitive intervals, and the value increases by more than 50%. AIS indicator indicates that IBES-KELM and IMBES-KELM generate better interval quality. In general, IMBES-KELM generate more sensitive interval, which can more accurately diagnose the occurrence of faults. It is worth highlighting that the proposed method holds significant practical significance for the efficient operation of WWTPs. Through its accurate prediction and diagnosis of faults, the proposed method can provide early warning signals to operators, enabling them to take appropriate measures to prevent accidents and ensure smooth and efficient operation of the WWTP.

Conclusions
This paper proposes an improved IMBES-KELM algorithm for predicting water quality indicators in wastewater treatment plant effluent. In addition, to address data uncertainty, a prediction interval is generated based on the KDE method at a specified confidence level, providing upper and lower bounds for the predicted results. Then, fault diagnosis is performed based on different fault scenarios. Simulation results show that the proposed method has a high prediction accuracy for effluent quality and can generate smaller prediction intervals at the same confidence level, enabling accurate detection of faults.
Note that the proposed method only predicts water quality indicators and it is not the final work of wastewater treatment process. With the surge of data in wastewater treatment plants, it is necessary to consider building a big data ecosystem to support real-time monitoring and management of data streams for online decision-making and real-time fault diagnosis. Applying this work to reliable decision-making and generating appropriate control strategies will also be our future work.