Multimodel Anomaly Identification and Control in Wet Limestone-Gypsum Flue Gas Desulphurization System

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China Beijing Key Laboratory of Computational Intelligence and Intelligent System, Engineering Research Center of Digital Community, Ministry of Education, Beijing 100124, China Beijing Advanced Innovation Center for Future Internet Technology, Beijing University of Technology, Beijing 100124, China Technology Research Center, Shenhua Guohua(Beijing) Electric Power Research Institute Corporation, Beijing 100025, China School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China School of International Studies, Communication University of China, Beijing 100024, China


Introduction
With the increasing emergence of hazy weather over the past few years, environmental deterioration has raised great concern in China. Sulphur dioxide, serving as one of the primary contributors giving rise to atmospheric pollution, has a negative effect on human health as well as the environment [1][2][3]. Moreover, the combustion of fossil fuels, especially coal, in power plants is the major cause for existence of sulphur dioxide in the atmosphere [4]. erefore, the removal of sulphur dioxide in flue gas from coal-fired power plants is of great significance and has gradually become the overriding concern to the public. Among a wide variety of flue gas desulphurization methods, wet limestone-gypsum flue gas desulphurization (WLFGD) is most power plants' first priority due to the fact that it is characterized by low operation cost and relatively high utilization of absorption reagent [5,6]. In order to suppress the pollution brought by sulphur dioxide, the concentration of outlet flue gas along with a slurry pH value are selected as the controlled variables in our case study, and the former of which is a direct indicator that reflects desulphurization performance. Also, it has been verified that the pH value in slurry tank has a major influence on desulphurization efficiency [7].
Anomaly detection (or outlier detection) is the process of identifying unexpected observations or events, which differ significantly from the majority of the data [8]. Due to the fact that anomaly detection is an area of high application value, it has been extended to various fields, especially in the industry. In this work, we focus on the case where flue gas desulphurizes in the context of anomalous conditions. It is well established that industrial processes and facilities therein are of ever-increasing complexity over the past few years, which results in a greater chance for occurrences of distinct faults. Moreover, given that flue gas desulphurization is a type of chemical engineering process, there exist some unique faults that induce anomalous conditions other than traditional industrial ones, such as catalyst deactivation, valve blockage, or compressor failure [9]. us, regarding chemical systems, a special attention must be placed on reliability and stability of systems during the whole operation, which entails detecting anomalous conditions and performing effective control on the system. In the past several decades, there are quite a few publications covering the subject of anomaly detection. However, there are few publications covering anomaly detection in the desulphurization system, let alone apply control techniques in anomalous conditions therein. To the best of our knowledge, nearly all of the contributions are concerned with fault detection and diagnosis problems. In literature [10,11], principal component analysis (PCA) with monitoring indices Hotelling's T2 and squared prediction error is utilized to find the exact fault appearance during a specified timeframe in flue gas desulphurization system. e difference in kernel PCA is applied in the latter work, and the hybrid prediction model, which contains autoregressive integrated moving average model and least squares support vector machine model, is suggested in [12] to make prediction of the slurry pH value in the first place, and then a fault diagnosis system is designed to locate faults of the pH sensor. Additionally, modelling based on physical characteristics of combustion in flue gas desulphurization is introduced in [13]. In this method, a flue gas estimation model is built and used to identify process anomalies together with sensor malfunctions.
As for publications mentioned above, there exists a commonality that the realization of fault detection or diagnosis with proposed methods is driven by labeled instances, namely, all instances used contain explicit labels of various anomalies. Nevertheless, from a practical point of view, it is challenging to group anomalies into multiple categories (i.e., faults of different types) accurately in WLFGD system, as the malfunction of certain part in system may trigger successive faults in other components. Accordingly, the classification results depend critically upon individual experience in practice. In this case, the usage of exploring and grouping method is more advantageous for the extraction of underlying anomalous conditions in practical applications. erefore, in this article, the integration of isolation forest and adaptive mean shift clustering are employed to determine types of anomalous conditions existing in our system. Another issue that has to be taken into account is what kind of control strategy should we apply in this context. Here, multimodel control is utilized in our case study due to complicacy and diversity of operating environments in the whole desulphurization process. More specifically, as a fair number of devices and plants are employed on-site, there are many possibilities for occurrences of anomalies, which are actually one of the major reasons for variations in the environment. Moreover, since many subprocesses in flue gas desulphurization are interactive, the presence of unknown and discontinuous anomalous conditions in any subprocess, even caused by some unnoticeable devices, may bring about parameter changes in controlled actual system. Given that multimodel control is particularly suited to cope with a class of control problems, which caused by parameter uncertainties due to changes of operating environments, it is selected as the control strategy in our case study. e main contribution of this work can be summarized as follows: (1) a novel exploring and grouping technique is proposed to provide a detailed analysis on anomalous environments in the industrial field. (2) MPDNN-based adaptive control is proposed and extended to multimodel for the first time. (3) Different combinations of control modes with respect to the reinitialized MPDNN adaptive model are compared and discussed. e remainder of this paper is organized as follows: Section 2 describes the principle and process of wet flue gas desulphurization briefly. en, the integrated anomaly detection technique based on isolation and clustering algorithms is introduced in Section 3. Section 4 discusses theoretical knowledge on identification and control process of multimodel system with MPDNN. In Section 5, simulation studies show the efficacy of the presented control strategy by comparing against numerical cases. Finally, Section 6 concludes this paper and our future work is given.

The Introduction of Flue Gas Desulphurization System
In fact, the whole process of sulphur dioxide abatement in WLFGD system is complicated, although the chemical equation is seemingly simple from its form. e overall reaction of WLFGD can be expressed in terms of a chemical equation as follows: e basic reaction principle can be obtained from the overall reaction equation above. at is, the limestone in slurry serves as an absorbent reacting with sulphur dioxide in flue gas, and CO 2 along with gypsum (CaSO 4 ·H 2 O) is finally produced. In the following, the process of wet flue gas desulphurization is outlined. For purposes of clarity, the general structure of WLFGD system is presented in Figure 1, which provides an illustration for procedures in desulphurization. e limestone slurry as the absorption agent is fed to the absorber from slurry tank. In this process, both density and flux of the slurry can be measured via a densimeter and flowmeter mounted on the duct. At the same time, circulating pumps are utilized to rise up the slurry so that it can drop down from the top of the absorber. On the contrary, flue gas passes through the heater exchanger before entering the absorber from the inlet duct in order to lower 2 Complexity gas temperature. As the inlet duct is situated at the lower part of the absorber, it would result in countercurrent contact between the flue gas and limestone slurry. Next, consequential chemical reaction between two components removes sulphur dioxide content in flue gas. Finally, desulphurized flue gas discharges from the stack and gypsum, as a solid byproduct, is obtained in absorbent slurry, which can be applied in fields such as wallboard and cement [14,15].

Extraction of Anomalous Conditions in WLFGD System
During the operation of WLFGD system, the operating data with an array of measured variables was collected from a 1000 MW coal-fired power plant unit every minute. In this section, some preliminary analyses of relevant variables are performed in the first place, which lays a foundation for anomalous conditions extraction later in this section. en, relevant techniques utilized in the process of seeking anomalies are detailed.

Preliminaries.
Here, four output variables in the system are selected to perform analysis on anomalies. ey are outlet O 2 concentration, outlet sulphur dioxide concentration, outlet flue gas temperature, and slurry pH value. Before searching for anomalous conditions, the Pearson correlation coefficients between variables are calculated, and they are visualized in the form of correlation matrix ( Figure 2). e correlation matrix is featured by symmetry, unit diagonal, and semidefiniteness [16], each entry in the matrix stands for the correlation coefficient between two variables. e matrix mentioned above explicitly indicates the degree of relationship between each pair of variables. For notational simplicity, each variable mentioned above is denoted by parts of its full name in the obtained correlation matrix. As it is shown in Figure 2, the absolute values of Pearson correlation coefficients regarding four variables are all over 0.50, some of which even exceed 0.92, which reflects high levels of correlation among four output variables in WLFGD system. By [17], variables with high correlation imposes restrictions on the performance of anomaly detection, for a minor deviance from normal scope of correlated variables may induce relatively greater anomaly. Moreover, high-dimensional data would bring about the problem of curse of dimensionality [18,19]. PCA, as one of the most commonly used feature extraction techniques, reduces dimension by generating fewer principal components (PCs) to substitute for original variables. On the contrary, PCs are uncorrelated to each other [20]. In this sense, the problems arose by highdimensionality data and correlated variables in anomalous conditions extraction can be addressed by means of PCA. Furthermore, the number of components to retain is another problem that needs to be taken into account before dimension reduction. e determination of PCs' number can refer to criterions as follows [21]: (1) eigenvalues of each PC should be greater than one; (2) each component explains at least 5% of variance; and (3) cumulative explained variance reaches 70%. Here, the summary of eigenvalues, proportion of explained variance, and cumulative explained variance for PCs with different numbers calculated by above four output variables are tabulated in Table 1.
As is shown in    two PCs is up to 62.3% and 25.3%, respectively, and cumulative explained variance reaches 87.6%. Based on the above-stated criterions, the retention of two PCs is preferable. Consequently, foregoing four output variables are reduced to two principle components for further analysis on anomalous conditions below.

Extraction of Anomalous Conditions.
Having reduced dimensionality of practical measured data in WLFGD system, anomalous conditions would be detected and extracted thereafter, which can make necessary preparation for the construction of the multimodel control system in the next section. In this subsection, the integration of isolation-based and clustering-based method is employed to perform the task of anomalous conditions extraction in the context of WLFGD. From a practical point of view, computation efficiency is of crucial importance to anomalies detection, especially in industrial scenarios, due to the requirements of initiating prompt action to maintain system stability. Isolation forest (IF) is a type of ensemble-based anomaly detection method proposed by Liu et al. in [22], which is characterized by high accuracy, low memory cost, and computation complexity [23], thus can guarantee the timeliness of searching for anomalies in system. Furthermore, for isolation forest algorithm, anomalous instances are not required during the training process [24]. erefore, it is well-suited to implement in an industrial environment. e rationale of using IF to detect anomalies is on the basis of inherent properties of anomalies, that is, the quantity of deviated data is small and they are in stark contrast to normal ones. On these bases, provided that a tree structure is utilized to describe the recursive process of data space partitioning, anomalous data are more likely to be separated thus in the upper position of the tree compared with normal points. More specifically, let x n N n�1 ⊂ R d be the points of interest, where N and d denote the number of instances and attributes, respectively; the procedure for formation of an isolation forest is summarized as follows: (1) A subset is built by selecting k (k <N) instances randomly from N instances, and an iTree would be constructed by partitioning this subset recursively. (2) Firstly, select an attribute from the subset arbitrarily. en, a split point is chosen as any point within the selected attribute in this subset.
(3) e data space constituted by the subset is segmented into two subspaces, which correspond to left and right subtrees, by the chosen split point. e data would then be assigned to corresponding subtrees followed by comparing their values with the split point in the specified attribute, and the smaller and greater ones correspond to left and right subtrees, respectively. (4) Repeat the second and third steps for each new subspace until there is only a single point in the subspace, or the height limit h of iTree is fulfilled. h is a parameter depending on the size of the subset with the form of h � [log k 2 ]. In this way, an iTree is established. (5) Constructing n iTree according to steps stated above such that the isolation forest {T 1 , . . ., T n } can be formed.
As for any point x, the partition time to isolate that point in ith iTree can be referred to as path length L(x i ), which provides reference for judging whether x is an anomaly. In this case, n path lengths can be acquired, and each of which corresponds to an iTree in{T 1 , . . ., T n }. en, the path length for x in an isolation forest is calculated by taking the average of n path lengths, i.e, e anomaly score that measures the outlierness of x is defined in [17] and given as where k is the number of instances in the subset, and c(k) is used to normalize L(x) and can be written in the form with H(x) denoting the harmonic number, which can be estimated by Euler's constant as H(x) � ln(x) + 0.5772156649. Taking two extreme cases into account, where the minimum and maximum distances from x to the root of each full-grown iTree are achieved, respectively. At this point, E(L(x)) reaches the minima 0 and maxima n − 1. Substituting two extreme values of E(L(x)) into (3), it can be concluded that the anomaly score for x is within the range [0, 1]. Once the anomaly score has been calculated for each sample, anomalies can be sought out based on the threshold of anomaly score, and the more it approximates to 1, x is more likely to be an anomaly. As a rule of thumb, the threshold of anomaly score is taken as 0.6. In our case study, the instances contained in the subset and the number of trees are set as 256 and 100, respectively, which have been proven to be the optimal parameters that yield the best performance of anomaly detection [22]. On this basis, anomalous points were searched among dimension-reducing operating data. To simplify the presentation, the result obtained in a certain period of time is given in Figure 3. As shown in Figure 3, partial detected anomalies are marked by red points and remaining points are considered as normal ones. Likewise, anomalous points over different periods of time can be collected and pooled together. Although all anomalies have been found at this stage, there is a need to determine the type of anomalous condition for further study. However, due to the complexity of operating conditions in practical WLFGD scenarios, it is challenging to know the specific number of anomalous conditions, and the rational partition of them is also an intractable problem. Here, adaptive mean shift is applied to address the encountered problem.
Mean shift (MS) is a nonparametric mode seeking algorithm based upon kernel density estimate, which is able to find clusters with any shapes in data [25]. e major downside to MS is its high computational complexity in high dimensions [26]. Nevertheless, in our case study, the dimension of feature space has been reduced in advance, which avoids the computational problem it brings. Meanwhile, the number of modes (i.e., clustering numbers) can be automatically determined by the algorithm; therefore, MS is of practical significance for determining the type of anomaly in our case study. e principle of MS clustering is outlined as the process of moving each point towards the direction of maximum local density, and the direction is determined via kernel density estimate. e movements of each point are repeated until the point converges to another point with local maxima of density (i.e., mode). Lastly, points with the same mode are classified as the identical cluster. e bandwidth of a kernel function is the unique and key parameter in MS which affects the clustering result of the MS significantly [27,28]. Based on whether the bandwidth is variable during the shifting of points, MS can be divided into two groups, i.e., constant and adaptive mean shifts. As for constant bandwidth, either too small or too large bandwidths may lead to inappropriate clustering. By comparison, adaptive mean shift (AMS) can yield better results for clustering. Assume x i ∈ R d , i � 1, . . . , N be the data points in a d-dimensional feature space, and each of which is assigned with a bandwidth hi. e density estimation of the sample point with kernel profile k is defined as where Gaussian kernel is used; the corresponding profile is a function of the form , then the gradient of kernel density estimation shown in (5) is written as where the rightmost part in (7) is known as the mean shift vector, whose direction yields maximum density increment. Each point in the feature space moves to local density maxima progressively with the direction of gradient, namely, the direction of mean shift vector. Now, there arises the question that how to determine the adaptive bandwidth h i . Here, the k-nearest neighbor method in [26] is adopted. at is, the bandwidth of x i is dependent on its kth distant point x i,k , and h i takes the form It follows from (8) that the value of k greatly influences the clustering outcome by changing the bandwidth of each point. On the basis of detected anomalous points, clustering is performed by adaptive mean shift with different k. Meanwhile, silhouette coefficient is introduced as the measure of clustering performance to select the optimal k value. As for the clustering silhouette coefficient, the value is in the range [− 1, 1] and the rationality of the clustering result is proportion to its value. With anomalies collected by IF, the silhouette coefficient is calculated and plotted against k with the interval of 50; the result is presented in Figure 4.
As is depicted in Figure 4, it is clear that the clustering silhouette coefficient reaches its maxima when k equals to 100. In this sense, the optimal clustering effect can be achieved with k � 100. Accordingly, the subdivision of anomalous conditions in WLFGD system was carried out by AMS in this context. To enhance the accuracy of clustering, both dimensions of obtained anomalies are scaled to [− 1, 2] before clustering, and the result is shown in Figure 5.
As illustrated in Figure 5, anomalies are classified into three categories, each of which is constituted by dots in a certain color. Meanwhile, the mode of each category is Complexity marked by a solid circle, respectively. At this stage, the data of various anomalous conditions can be extracted via the clustering result mentioned above. en, the dataset corresponding to each type of anomalous condition was randomly classified into two portions, including 70 percent for model training and 30 percent for control testing, and both of the training and control process would be made clear in the next section.

Identification and Multimodel Adaptive Control in Anomalous Conditions
In effect, anomalous conditions extraction done before makes provision for adaptive multimodel control introduced in this section. e overall control process can be briefly summarized as two steps. First, models are established to characterize different types of anomalous conditions extracted above. en, the multimodel control strategy is applied in connection with those models. Abovementioned two steps correspond to the following two subsections separately. Besides, as pointed out in the first part of this work, the pH values in limestone slurry and outlet sulphur dioxide concentration are chosen as controlled variables (i.e., output variables) of the system. Second, the flux and density of limestone slurry, which are readily available through sensors meanwhile have a considerable impact on system outputs, are selected as control variables (i.e., input variables) of the system.

System Identification.
Dynamic neural network has drawn increasing attention and been exploited in many fields, due to its preferable capabilities in representation of nonlinear system compared with the static one. Moreover, dynamic neural network used for nonlinear system identification can be further categorized as series-parallel model and parallel model [29]. In view of the fact that reactions taking place in the absorber have high level of complexity, thus there exist great obstacles to model the system with chemical mechanism. Meanwhile, high nonlinearity and delay existing in the process of slurry pH variations further add the difficulty for system identification in the context of WLFGD. On account of the abovementioned merits of dynamic neural network, here, the parallel form in it is utilized to identify the system in our case study. e strengths of this type of neural network can be summarized as follows: (1) the output of actual system is less affected by the bias resulting from noise [30], and (2) its identification model is particularly suitable for using offline. As the parallel dynamic neural network with no hidden layer has been described in detail in [31][32][33], in what follows, the description of MPDNN is our main concern. Consider the nonlinear system to be identified is given as where the state x t ∈ R n and input u t ∈ R m , and the dynamic neural network of the following form is used to identify the system: with x t ∈ R n being the estimate of state. A ∈ R n×n is a diagonal Hurwitz matrix. e control input function is assumed bounded and taken as c(u t ) � u t . Let l denote the number of nodes in the hidden layer, then V 1,t and V 2,t are lby-n weight matrices and W 1,t and W 2,t are n-by-l weight matrices. σ(·) ∈ R l is a sigmoidal vector function, and ϕ(·) ∈ R l×l is a diagonal matrix constituted by elements of the sigmoidal vector function, which can be represented as , and the component of sigmoidal vector function takes the form In reality, nonlinear system (9) cannot be represented by (10) precisely due to the presence of unmodeled dynamics Δf, which can be formulated as  6 Complexity with W * 1 , W * 2 , V * 1 , and V * 2 being any fixed weight matrices. Since the matrix A in (10) is stable, the pair (A,R 1/2 ) is controllable, the pair (Q 1/2 ,A) is observable, and the local frequency condition expressed by (13) holds [34] en, the matrix Riccati equation of the form, has a positive symmetric solution P. en, it follows from [35] that weight matrices update law can be given as follows: where the identification error Δ t is denoted as x t − x t . Gain matrices K i 4 i�1 ∈ R n×n are utilized to determine the learning rate of neural network. Both Λ 1 and Λ 2 are positive definite matrices, and P stands for the solution of Riccati equation in (14). In addition, , and when t � 0, V i,0 � V * i . In our case study, flux F t and density D t of limestone slurry are regarded as inputs of the neural network. On the contrary, outlet sulphur dioxide concentration x 1t and slurry pH value x 2t are quantities to be estimated.
us, the input-output mapping relation can be given as where f is a nonlinear vector-valued function. Aimed at enhancing the convergence speed, all measurements are scaled to zero mean and unit variance before the online identification. In the following, system identification is carried out with or without the existence of hidden layer in neural network. Further, in the case where the hidden layer is involved, two and three nodes are used, respectively, in order to find the form with the optimal identification accuracy. Here, due to the space limitation, only one type of extracted anomalous condition is chosen to be identified. As for the neural network without hidden layer, ϕ(·) and weight matrices V 1,t , V 2,t in (10) are set as the identity matrix. Sigmoidal functions in all three cases are given as Initial weight matrices are given as Additionally, A � − 20 0 0 − 10 , R � 15 0 0 10 , and Q � 20 0 0 10 . us, the solution P of (14) can easily be calculated as P � 2 0 0 1 . Adaptive gains are selected as K 1 � K 2 � 15 and the initial states are set as x 0 � [0, 0] T . It should be noted that the selection of A, R, and Q should preferably make sure that the solution P is a real matrix such that computation time in the identification process can be reduced. Moreover, adaptive gains can be appropriately increased to speed up the identification process. On this base, identification for anomalous condition in the selected training set is obtained by this no-hidden-layer dynamic neural network with weight updating law given in [32]. e estimates and actual outputs are given simultaneously in Figure 6 for comparison purpose.
As for MPDNN with one hidden layer, output weights are the same as W 1,0 and W 2,0 set above, and input weight matrices in the hidden layer are initialized as In such a case, the system is identified by MPDNN in (10) with two nodes in the hidden layer, weights are updated based on (15) with K i � 15 (i � 1, 2, 3, 4), Λ 1 � Λ 2 � I, while the setting of remaining parameters remain unchanged. Figure 7 presents the identification result for both states.
Likewise, in the case where three nodes are in the hidden layer, initial weight matrices are given as follows: Remaining parameters are selected the same as those in the former two cases; the identification results are shown in Figure 8. It should be mentioned that the excessive number Complexity of parameters in weight matrices may pose limitations on identification speed. erefore, certain elements in the abovementioned weight matrices are set as zero and unchanged during the identification process.
Next, identification is conducted for the testing set with the same parameters. Due to the space limitation, results are not visualized here. Instead, the identification performance of three kinds of neural networks is measured by the mean squared error (MSE) criterion, and corresponding results in training and testing sets are presented in Table 2.
From Table 2, it can be easily concluded that MPDNN outperforms the no-hidden-layer one both in the training set and testing set. More specifically, the one with three nodes in the hidden layer possesses the optimal identification performance. Due to the fact that the control scheme designed in the following section is based on the principle of certainty equivalence, thus the control performance depends critically on identification accuracy of models. On this base, MPDNN with three nodes in the hidden layer is selected to model other two types of anomalous conditions based upon their   8 Complexity respective training sets, and they would serve the control purpose below.

Control Design.
As discussed in the introductory section, it is truly difficult to exert effective control over the whole desulphurization process; this can be attributed to the fact that actual operating environments in WLFGD system are complex and manifold. All the above considerations suggest that the adaptive multimodel control strategy is particularly suited to cope with such kind of problem, where the nonlinear system with large parametric uncertainty requires to be controlled under a time-varying environment. However, there also exist some issues precluding the application of multimodel control in practical scenarios, and we shall discuss them in the following.
To simplify the exposition, some mathematical notations are introduced into the statement of the problem. Let p be the parameter vector of the plant at a certain instant, while p i denotes the parameter of the ith model among multiple models at the same instant. It is assumed that both p and p i lie within a d-dimensional compact set U in the parameter space. Our objective is to estimate by means of multiple models at each instant. As for the ith fixed model, parameter vector p i is located in the same position at all times. Rather, an adaptive model would approximate p progressively by parameter adjustment from initial values, hence parameters of adaptive models are time-varying before convergence is reached. By far, a great deal of research has been conducted, both in linear and nonlinear cases, to find the optimal implementation of multimodel control, and distinct combinations of models are studied, which include (1) fixed models only; (2) adaptive models only; (3) fixed models and free running adaptive models; and (4) fixed models, reinitialized and free running adaptive models. It has been demonstrated that the last control architecture, which is of interest in this work, could yield the best performance, and the reader can refer to [36,37] for more details. With this architecture, as long as one of the fixed models is selected, parameters of the reinitialized adaptive model would be reset to those of the selected models, and then adaptation of parameters would proceed from initialized ones. Provided that p is located somewhere in U at a certain instant, the neighborhood of p i N i�1 can be viewed as a class U i N i�1 distributing in U. In this case, fixed models are served as providing initial parameter estimates for p such that it is included by a certain U i . en, tuning is carried out based on p i , namely, a point in U i , to asymptotically converge to the real parameter p. e two stages in this process can eliminate the transient and steady-state error, respectively. Obviously, whether the control architecture of this sort can be applied effectively largely depends on the closeness between parameter p of fixed models (i.e., initial parameters of the adaptive model at switching instant) and p of the real plant at  Apart from the qualitative problem for determination of initial parameters, a quantitative problem concerning the number of fixed models is also formidable. e latter problem imposes limitations on practical application of such kind of control strategy as well, for computational overhead and efficiency that deal with the number of models must be taken into account. From a practical point of view, it is often the case that plants only switch among several subsets in U. us, there is no point in building fixed models with improper parameters. To handle two above-stated problems, sufficient knowledge about U should be gained such that parameters of fixed models are obtained; further, each subset U i (i.e., the corresponding area in which adaptation of parameters takes place) could be determined. To this end, as indicated earlier, anomalous conditions were analyzed and extracted in turn and then subdivided into varied types. In this sense, information regarding anomalous conditions on-site can be obtained thoroughly, based on which the number of fixed models is therefore determined. Our idea is illustrated intuitively in the form of a flow diagram (Figure 9).
It can be seen clearly from Figure 9 that the multimodel control process is partitioned into two parts in our work, where the left-half part makes detailed analysis for different types of anomalous conditions by data exploring and mining. In this way, qualitative and quantitative information on anomalies is acquired, which could be further utilized to provide guidance for model set building in the right-half part of Figure 9. At the stage of control design, a corresponding controller is designed for each constructed model and the controller set C i N i�1 is established. In fact, models in the model set I i N i�1 and controllers in C i N i�1 are pairwise. at is, one pair in IC i N i�1 is selected at each instant based on the switching scheme introduced later.
As stated in the introductory section, both controlled variables, slurry pH value, and outlet flue gas concentration are significant indicators that can measure the effectiveness of desulphurization. In our case study, a regulation problem would be resolved, i.e., multimodel adaptive control was carried out to track the reference values we set. e set point of slurry pH value should make trade-off between desulphurization efficiency and safety problems caused by corrosiveness. It has been verified in [38] that the reasonable value of pH is 5.5 at normal operation conditions. Besides, the desired outlet flue gas concentration is set as 3.6 mg/m 3 in light of the practical condition and emission standard. Hereafter, the desired states are represented as x * � [x * 1 , x * 2 ] T and the control error vector is written as e � x * − x. Differentiating e with respect to t such that dynamics of the control error can be derived, which is expressed as and then (10) is invoked to rewrite (21) in the following form: If we choose the control input to be in the form where the first term on the right-hand side of (23) denotes Moore-Penrose inverse of matrix W 2,t ϕ(V 2,t x t ).
Assumption 1. e number of nodes l in the hidden layer and the number of states n satisfy l ≥ n. Furthermore, n-by-l matrix W 2,t ϕ(V 2,t x t ) has full row rank, i.e., r[W 2,t ϕ(V 2,t x t )] � n, such that the product of W 2,t ϕ(V 2,t x t ) and its Moore-Penrose inverse equals to the identity matrix. In particular, W 2,t ϕ(V 2,t x t ) is a square matrix when l � n, then the condition becomes nonsingularity of the matrix W 2,t ϕ(V 2,t x t ) [39,40].
By assumption, substituting (23) into (22), the closedloop dynamics of error is obtained: Clearly, as A is a Hurwitz matrix, the control error is asymptotically stable with the control input we implemented. In addition, the switching time and sequence of multiple models are entirely dependent on the switching scheme. e switching scheme is designed on the basis of the performance index for each model. For the assurance of switching accuracy, both instantaneous and cumulative identification errors are required to be taken into account in performance index J i (t) N i�1 . e performance index chosen in our study has the form: with α and β being the weights of instantaneous and cumulative identification errors. e i (t) denotes the identification error at instant t, and the exponential weighting method is applied in the cumulative error calculation, wherein λ indicates the forgetting factor. e rationale for switching scheme is to search for min i J i (t) at each instance. e smaller performance index J i is the higher identification accuracy of the ith model. On this base, the corresponding controller C i (t) is chosen such that the best control performance can be yielded at the corresponding instant.

Simulation Studies
In this section, different multimodel control modes are designed and analyzed on the basis of anomalous conditions extracted in previous sections. Moreover, MPDNN with three nodes in the hidden layer (without further specification, MPDNN denotes this kind of structure in this section) is used for model and controller design in the following cases.

Preparations.
In previous sections, anomalous conditions in different types have been extracted, and one of which has been modelled via MPDNN over its training set. e convergent weights of neural network are recorded as θ 1 in this case. Similarly, the other two types of conditions can be identified over the respective training sets, during which the same technique (including the preprocessing of data) is used and requires suitable choice of parameters in MPDNN. In this way, with the expected identification accuracy (MSE for both states in two cases is less than 2e − 4), three sets of parameter vectors were obtained.
and each a to h in θ are constituted by the corresponding pair of elements in W i and V i (i � 1, 2) as follows:   Complexity e resulting weight matrices mentioned above are used to establish fixed models, respectively. To test and verify the effectiveness of the proposed method in practical WFLGD system, a simulation platform of anomalous conditions in our case study is set up. In this platform, each type of anomalous environment is simulated via MPDNN in the above form over respective testing data. With no loss of generality, normal condition is taken into account by fitting MPDNN to the part of the remaining normal operation data. To assure the accuracy of simulation, parameters for MPDNN in each situation were determined by trial and error such that the optimal identification performance could be achieved. In this way, the final mean squared identification errors for both states in all the four conditions are lower than 1.042e − 3. At this stage, both fixed models and simulation environments in WLFGD system have been established. e platform constructed above would simulate both three types of anomalous conditions and the normal operation condition every 30 units of time, which can be expressed as Remark 1. Initial weights of models have a major effect on the control performance of multimodel adaptive control, particularly in practical applications. e problem of interest in this paper is mainly concerned with the setting of initial weights in anomalous conditions; the discussion and analysis of normal operation conditions are out of scope. Hence, we do not arrange the testing set and build only one model for the normal condition. e initial states of all MPDNN in the simulation platform are set as x 0 � [0, 0] T . In addition, as for different cases conducted below, proper combinations of α, β, and λ were selected for the performance index in (21). In the following, simulations on different cases would be conducted in a sequence, with the objective of testing the effectiveness of MPDNN multimodel adaptive control with different combinations of fixed models.

Experiments.
As mentioned previously, it has been verified that the mode of fixed models with reinitialized adaptive models has the optimal control effect in multimodel control. Moreover, the topic discussed in this work focuses more on the selection of fixed models in practical multimodel control. erefore, combinations of different kinds of models would not be involved in cases below. Instead, varied methods for construction of fixed models are simulated to test the validity of the method proposed in this article. To simplify notations, in cases below, the ith MPDNN fixed model is denoted as IF i , and the conventional and reinitialized MPDNN adaptive models are denoted as IA 1 and IA 2, respectively. e parameters of the model would adjust with variations of the environment. Adaptive control of this sort can be effective when slow variations take place in the environment, whereas the large overshoot would arise at the instant when unanticipated environment is met. e initial value of IA 1 in this case is set as θ 1 . Figure 10(a) displays the response of the plant in different conditions, and Figure 10 In this case, the building of IF i 3 i�1 is dependent on the partition of time series. To be specific, all extracted anomalous points were ordered in ascending time sequence. en, points were equally divided into three portions, and each of which are utilized to set up a fixed model in the same fashion as before with convergence weights and proper parameters, and mean squared identification errors for two states in all the three models are below 8.703e − 4. As for IA 1 applied in this case, which aims at controlling the plant in normal conditions, its initial parameters are set close to weights corresponding to normality in the simulation platform. Moreover, the weights of IA 1 would be initialized again when variations of parameters take place in the simulation platform. e system response and absolute value of the control error are presented in Figures 11(a) and 11(b), respectively. Intuitively, compared with the control mode used in Case 2, the control performance has improved dramatically. It is seen that the plant is controllable under different operating conditions. However, overshoots are somewhat large when parameters change in the plant. Particularly, at instant 90 s, a comparatively long period of time is needed for the adaptation process.  en, anomalous points corresponding to three intervals, namely, [y min , η 1 ), [η 1 , η 2 ), and [η 2 , y max ] (y min and y max are minima and maxima of PC1), are collected and used to build three fixed models by MPDNN as before. IA 1 has the same setting as that in Case 2. Figure 12 presents the corresponding control response in this case. By comparing the absolute value of control error shown in Figure 12(b) with its counterpart Figure 11  In Case 4, the determination of fixed models relies on the extracting results of the anomalous condition obtained in Section 3. erefore, three fixed models are constructed based on weights θ i 3 i�1 . Meanwhile, the setting of IA 1 is consistent with the above cases. e resulting output and control error of the plant are depicted in Figures 13(a) and 13(b), respectively. It is clear that the control mode in this case can yield far better control performance in contrast to all cases mentioned above. From a transient response viewpoint, when parameters of the plant vary at different instants, the output can maintain small overshoot. On the contrary, the steady-state error can reach zero over time. e reason for such improvements lies in the reduction of initial parameter error in IA 2 . As revealed   14 Complexity from the switching sequence in Figure 13(c), each time when the type of anomalous condition shifts, the fixed model with the least performance index would be chosen to lower the transient error in the first place, and then IA 2 would take over as the controller until next environment appears. As the fixed models we built are able to roughly describe actual environments in different time ranges, transient errors are relatively small when switching to a new environment. Also, due to the closeness between parameters of fixed models and real ones, it takes little adaptation time for IA 2 to revert to set points.  Figure 13, it is clear that there is not much improvement in transient response with the variation of anomalous conditions. In fact, it can be seen that the system response in the normal operating condition is even worse. e reason can be concluded from the switching sequence shown in Figure 14(c). At the stage of controlling in normal condition, the parametric closeness of three additional fixed models to true parameters of the plant may give rise to occurrences of frequent switching, which causes intermittent overshoot between 60 and 90 units of time. Furthermore, during simulation, the amount of Complexity computation became prohibitively large in this control mode, which proved once again that the inappropriate choice of model number when applying multimodel control can induce severe consequences in practical scenarios.

Conclusions
is paper focused on addressing the practical control problem in WLFGD system by using the MPDNN multimodel control strategy. e difficulties for the application of multimodel control in real world have been stated in Section 4. With the objective of implementing multimodel control in the context of anomalous conditions, a chain of techniques have been adopted to search and extract different types of anomalies. en, MPDNN, which has an excellent capability of approximating uncertain environments, is utilized to identify those anomalous conditions and models are built accordingly. As indicated in Section 5, in comparison with other cases, the control performance of MPDNN multimodel control system, which is built upon the process we proposed in this work, can yield satisfactory control performance for simulated WLFGD system. Based on this direction, the main concern in our future work would be combinations of data analysis techniques and adaptive control in the context of multimodel control.

Data Availability
e experimental data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.