Industrial Process Monitoring Based on Parallel Global-Local Preserving Projection with Mutual Information

: This paper proposes a parallel monitoring method for plant-wide processes by integrating mutual information and Bayesian inference into a global-local preserving projections (GLPP)-based multi-block framework. Unlike traditional multivariate statistic process monitoring (MSPM) methods, the proposed MI-PGLPP method transforms plant-wide monitoring into several sub-block monitoring tasks by fully taking advantage of a parallel distributed framework. First, the original datasets of the process are divided into a group of data blocks by quantifying the mutual information of process variables. The block indexes of new data are generated automatically. Second, each data block is modeled by the GLPP method. The variable information and local structure are well preserved during the whole projection. Third, Bayesian inference is introduced to generate ﬁnal statistics of the process by the probability framework. To illustrate the algorithm performance, a detailed case study is performed on the Tennessee Eastman process. Compared with the principle component analysis and GLPP-based method, the proposed MI-PGLPP provides higher FDRs and superior performance for plant-wide process monitoring.


Introduction
With the rapid development of information technology, intelligent sensors are widely applied to industrial manufacturing processes. For this reason, advanced monitoring systems such as SCADA and IIoT are adopted to retrieve real-time data. Process monitoring has become an emerging concern for researchers and industrial engineers [1,2]. Statistical methods have been intensively researched and applied to multivariate statistical process monitoring (MSPM) [3]. The most common practice is to extract process features with dimension reduction methods, then construct a latent variable model for dynamic industrial process monitoring [4]. These methods are based on partial least-squares (PLS) [5], PCA [6], canonical correlation analysis (CCA) [7], Fisher discriminant analysis (FDA) [8], independent component analysis (ICA) [9] and subspace-aided approach [10] etc. These methods and their variants are used to monitor linear, nonlinear, time-variant and multimode processes. To further investigate the performance, Shen et al. presented a review [11] to compare the performance of the aforementioned methods on the same industrial process.
Most of the traditional methods focus on solving the MSPM problem by generating latent variables from either global information or local structure, i.e., PCA-based methods use the variance of the dataset to generate projection directions. Meanwhile, a trend of preserving both global and local structures was observed in recent works [12]. The global-local structure analysis (GLSA) [13] model is first applied to monitoring plantwide processes, which shows promising application prospects. On the basis of the GLSA model, Luo L et al. presented GLPP [4] for industrial process monitoring and kernel GLPP (KGLPP) [14] for nonlinear process monitoring. Their research further analyzes the intrinsic relationship between PCA and LPP, revealing its performance theoretically. Similarly, the FDA is introduced into GLPP in [3] to enhance the fault-identification performance by maximizing scattering between classes and minimizing scattering within classes. Huang et al. [15] proposed the KGLPP-Cam model with a new weighted distance named Cam weighted distance. This approach is used to reselect the neighbors and consequently overcome the non-uniform distributed characteristic of data. For the dynamic process monitoring problems, Tang et al. [16] proposed a hybrid framework by integrating canonical variate analysis into GLPP.
Recently, plant-wide process monitoring is becoming a hot issue and has attracted a lot of researchers' interests. The multi-block methods are presented to generate sub-blocks of the original dataset, which contributes to significant performance improvement. In general, the multi-block methods consist of three major steps: • Block division. The industrial process data are divided into several blocks by certain strategies, i.e., mutual information [17] or knowledge-based methods [18]. Based on these steps, researchers have presented several application results. A distributed PCA (DPCA) method [19] is presented, which evolves into the distributed framework presented in paper [20]. This idea is also extended to big data applications [21] and quality monitoring [1]. For the purpose of utilizing process knowledge, a hierarchical hybrid DPCA framework is presented by introducing a knowledge-based strategy and mutual information [18]. For non-Gaussian industrial process monitoring, a multi-block ICA [22] is developed and applied to industrial processes. These data-driven methods have the capability to automatically divide the process into several sub-blocks. However, they are based on covariance and are subject to linear constraints. Process decomposition, as a crucial step in multi-block methods, holds significant importance for practical nonlinear fault detection and diagnosis.
Furthermore, a significant portion of research has been focused on the application of Bayesian methods for industrial process monitoring. For instance, Huang and Yan [23] proposed a dynamic process monitoring method based on DPCA, DICA and Bayesian inference. Jiang et al. [24] established a Bayesian diagnostic system based on optimal principal components (OPCs), enabling probabilistic fault identification. Zou et al. [25] employed Bayesian inference criteria for online performance assessment. Bayesian networks are commonly employed in industrial process monitoring to model dependencies and interactions among process variables, sensors and equipment [26]. Bayesian networks provide a systematic approach to handle uncertainty, identify causal relationships and propagate the effects of changes or faults throughout the entire system [27]. For most modern industrial processes, comprehending and evaluating multiple monitoring outcomes from different units or components using Bayesian techniques holds significant importance. However, Bayesian methods necessitate sufficient data for effective inference and updating, and their successful application relies on appropriate model selection and parameter configuration.
Inspired by the distributed PCA framework [20] and mutual information-based block division [17], a mutual information-based parallel GLPP method (MI-PGLPP) is proposed for plant-wide process monitoring in this paper. With the integration of mutual information, GLPP and Bayesian fusion in a multi-block framework, the advantages of MI-PGLPP are as follows: • MI-PGLPP utilizes mutual information of the variables and divides data blocks automatically, which does not require prior knowledge of the process.
• MI-PGLPP naturally meets the independent condition of Bayesian inference since variables in each block are divided by the independence of mutual information. • MI-DGLPP utilizes GLPP to obtain the latent matrix and transformation matrix of each data block. The intrinsic features of global and local structures are well preserved during the projection.
This paper consists of five sections. The rest is organized as follows: Section 2 briefly reviews the fundamental theory of the GLPP method and mutual information in MI-PGLPP. Section 3 presents the detail of MI-PGLPP for process monitoring. Section 4 gives a case study to evaluate the performance of the proposed method. Finally, Section 5 concludes the paper.

Preliminaries
This section presents a brief theoretical review of the GLPP method and mutual information. These two methods are integrated into the proposed MI-PGLPP method. The detailed information can be seen in the following parts.

Global-Local Preserving Projection
Benefiting from the dimensionality-reduction and feature-extraction capabilities of PCA and LPP, these two methods are often used in industrial process inspection to project original datasets into a low-dimensional space. During the data projection procedure, PCA preserves the variance information while LPP keeps the local neighborhood structure information [4]. Therefore, these two methods and their variants may lead to information loss when performing data projection. Unlike these methods, the GLSA-based [13] method aims to generate the spatial correlations of original datasets by preserving the variance and local structure information [3]. For a given dataset X = [x 1 , x 2 , . . . , x n ] ∈ R m×n with n samples and m variables, GLPP formulates a projection matrix A ∈ R m×k and projects X into lower-dimensional data Y = [y 1 , y 2 , . . . , y k ] ∈ R k×n (k m), In order to retain the global and adjacent structures of dataset X, the objective function is formulated as follows [4,13] J GLPP (a) = min where a denotes the transformation vector and y T i = a T X, adjacent matrix W ij and nonadjacent weighting matrix W ij are defined as follows [4,13,28].
where σ 1 and σ 2 are constant parameters and Ω k (x i ) denotes the k nearest neighbors of data point x i . It is easy to find out that σ 1 and σ 2 are both in the interval [0, 1]. Equation (2) can be reformulated as [3,4] J GLPP (a) = min where η is the tradeoff between global and local information in projection.
With some basic arithmetic operations, Equation (5) can be written as, Furthermore, a common constraint is introduced to transform the above objective function into a constrained optimization problem [15,16], where N = ηX HX T + (1 − η)I and I denotes identity matrix.
It is easy to transform Equation (7) into a generalized eigenvalue problem, By solving Equation (8), the transformation matrix A is finally obtained by the k-th eigenvectors a 1 , a 2 , . . . , a k which correspond to the eigenvalues λ 1 , λ 2 , . . . , λ k .

Mutual Information
Mutual information is defined as a non-parametric index [17] and used to quantify the mutual dependence of variables from the perspective of entropy [29]. For given vectors x 1 and x 2 , the mutual information MI is defined with the aid of the joint probability density function [30], where p m (x 1 ) and p m (x 2 ) denote marginal probability of x 1 and x 2 , respectively. p j (x 1 , x 2 ) represents the joint probability density function of variable x 1 and variable x 2 . By introducing Shannon entropy, Equation (9) can be clarified as follows [31]: where H j (x 1 , x 2 ) is the joint entropy of x 1 and x 2 , the Shannon entropy of x can be defined as [32]: Similarly, the joint entropy H m (x 1 , x 2 ) can be defined as [31]: Since mutual information describes the dependence of variables in high-dimensional space, it can be used in both linear and nonlinear analysis. Furthermore, it is obvious that the stronger relevance of two variables leads to larger mutual information.

Fault Diagnosis Based on MI-PGLPP
This section explains the detail of the proposed MI-PGLPP method. Its framework is shown in Figure 1, which consists of three major steps.

•
Step 1: Block selection. The original dataset is divided into several subblocks. • Step 2: Offline model training. The block model is obtained by the subblocks of the dataset. • Step 3: Online monitoring. The statistics of each subblock is generated by the online data and the final statistics pair is calculated by fusion strategies.

Mutual Information-Based Variable Block Division
A large-scale process such as the plant-wide process takes a number of resources to train a universal monitoring model. However, the model does not always perform as well as the initial purpose. Thus, some distributed methods are proposed aiming at a plantwide monitoring problem [20,21,33]. These methods divide the dataset into several blocks via physical structure analysis, empirical analysis and quantitative calculation. Mutual information is a typical technique of quantitative methods and used in various methods to deal with variable selection problems.
As in papers [17,29], mutual information is introduced to divide the original process data into several data blocks. In each block, the variables have stronger dependence, contributing to fewer model efforts and better data features. For dataset X ∈ R m×n , the mutual information MI x i ,x j is calculated as Equation (10), where i, j ∈ 1, 2, . . . , m. A numerical example for MI calculation is shown in Figure 2.  The division principle depends on the MI value. For example if MI x i ,x j > MI x i ,x k , variable x j will be divided into the block with variable x i . The dataset X can be divided into the following blocks, where N denotes the block number, for i ∈ [1, 2, . . . , N], X i ∈ R m i ×n and m i denotes dimension of data block X i . This step aims to partition the original dataset of the process into a set of data blocks by quantifying the mutual information between process variables. It automatically generates block indices for the new data. Based on the calculation principles of mutual information, a higher MI value indicates stronger dependency between variables, which contributes to reduced modeling effort and improved data characterization.

GLPP-Based Block Data Modeling
For the i-th data block X T i ∈ R n×m i , the offline normal operation model can be trained by GLPP , a i2 , . . . , a m ik ] ∈ R m i ×m ik denotes the transform matrix, Y i = [y i1 , y i2 , . . . , y m ik ] ∈ R n×m ik denotes the latent variable matrix, and matrix E i represents the residual.
In each block, the monitoring performance depends on the selection of parameter η. In order to meet the requirements of tradeoff between global information and local information in the latent matrix, several feasible principles have been applied according to the dataset. For industrial time series analysis, a spectral radius based approach [4,16] is presented to select parameter η, where δ(L) and δ(L) denote the spectral radius of local and global Laplacian matrices, respectively. For block i, the new data sample x new i ∈ R 1×m i is projected onto the i-th well-trained offline model, The T 2 i,new and SPE i,new statistics of the i-th block are calculated as follows, where i,new > T 2 i,limits , a process fault is triggered by the T 2 statistics of the i-th block model. Since there are several blocks, the final monitoring results can be further calculated by some fusion strategies. According to the literature review, the probabilistic framework is the common solution to this problem. The following section introduces a Bayesian inference strategy to generate final monitoring results.

Bayesian Inference-Based Monitoring Result Fusion
Bayesian inference is prevalent in distributed monitoring such as distributed PCA methods [35], distributed ICA [36] and so on. This section presents Bayesian inference based monitoring result fusion.
Firstly, the normal operation and faulty case are tagged with N and F , respectively. p i T 2 (N ) and p i T 2 (F ), p i SPE (N ) and p i SPE (F ) are the prior probabilities of T 2 and SPE statistics of the i-th block corresponding to operation modes N andF , which can be calculated by significance level α and 1 − α. The probabilities of T 2 and SPE statistics are determined by Based on Equations (20) and (21) and prior probabilities, the fault probabilities of data point x new i can be calculated as, where With Equation (22), fault probabilities of each block can be calculated corresponding to T 2 and SPE statistics. The probabilities set {p i ..,N are further integrated by a weighted probability approach [20], In a Bayesian inference framework, the assumption of independence is essential and requires lower correlation between variables x i and x j . Because sub-blocks are divided by mutual information with low dependence, variables in each sub-block are relatively independent of each other. As a result, fault-monitoring performance is guaranteed to the maximum extent.
Based on the final statistics p T 2 (F | x new ) and p SPE (F | x new ), the monitoring result is determined by the significance level α as (25) where N O and F O denote normal operation and faulty operation, respectively.

Monitoring Procedure
The proposed MI-PGLPP based process monitoring can be applied as the procedures in Figure 3.  Step 1: Steps of offline modeling: 1. Normalize each data block by Z-score standardization and generate data mean and variance.

2.
Analyze the variable dependence of training dataset by Equation (10), and then generate the data blocks and block index.

4.
Calculate the tradeoff parameter η by Equation (17) for each data block.

5.
Perform GLPP on each data block X i by Equation (7) and generate transformation matrices A i , latent matrices Y i and N O model.

6.
Define significance level α and calculate control limits for each data block by Equation (16).
Step 2: Steps of online monitoring: Monitor the process by Equation (25).

Experiments and Results
In this section, the Tennessee Eastman (TE) process [37] is adopted to evaluate the performance of the proposed method. The TE process is a benchmark for data-driven fault diagnosis strategies and is widely applied to illustrating process monitoring performance [11]. It consists of five major process unit [38], including a vapor-liquid flash separator, an exothermic two-phase reactor, a reboiled product stripper, a recycling compressor and a product condenser (mixer) [3]. As shown in Figure 4, the TE process produces two products and a byproduct from four gaseous materials [39].
The Tennessee Eastman process is simulated using mathematical models that capture the behavior of a chemical plant. The simulation involves solving equations based on principles of physics and chemistry to calculate the evolution of process variables over time. The simulation replicates the dynamics of a real chemical plant, considering factors such as reaction kinetics, heat transfer and fluid flow. It allows researchers to analyze the process behavior, evaluate control strategies and test techniques before implementing them in real-world applications.
The TE process has 53 monitored variables, including 22 process variables, 12 manipulated variables and 19 component measurements. In this study, 11 manipulated variables and 22 continuously monitored variables are selected as experimental datasets. A total of 21 faults were obtained under 21 different types of disturbances.
Faults in the TE process are classified into five categories: group 1 is step changes corresponding to faults 1-7, group 2 is random changes corresponding to faults 8-12, group 3 is a slow shift of reaction kinetics corresponding to fault 13, group 4 is unknown faults corresponding to faults 16-20 and the last group is valve sticking, corresponding to faults 14, 15 and 21. More information on faults in the TE process is presented in paper [3,17,40]. The training datasets are acquired under normal operating conditions, which consist of 960 samples, and all variables are collected every three minutes. Then introducing each fault into process at sample 160 to collect the fault datasets. The normal operation process is utilized as modeling input, and 21 faults are introduced to verify the monitoring performance. For further illustration, the feasibility of MI-PGLPP, PCA and GLPP-based methods are introduced to monitor the TE process. The significance level of three methods is set as α = 0.05, and the principle contribution rate is set as σ PCA = 0.85. For the GLPP method, the parameters are referred to paper [3] and selected as neighborhood number k = 10, local weight σ 1 = 48, global weight σ 2 = 132. The proposed MI-PGLPP also utilizes these parameter settings. Furthermore, the neighborhood number is set as k = 2 due to the group characteristics. The mutual-information-based multiblock principal component analysis (MI-MBPCA) [17] and the global DISSIM (GDIS-SIM) model [41] are also constructed for comparison. During the construction, the window length is set to be 30 by trial and error. A total of 371 moving windows are generated in each test dataset and the fault is introduced from the 172th the moving window. The confidence level is defined as 0.99.
The fault detection rate (FDR), which is widely used for monitoring results evaluation, was also introduced in this study for performance comparison. Since there are two statistical results of the experiment, this study uses a voting strategy to determine the FDR.
where FDR of certain statistics is defined as in Ref. [11], In order to construct a multi-block model, the mutual information between each pair of variables was computed. In all, 33 continuous variables are used for process monitoring, represented as x = [χ 1 , χ 2 , · · · , χ 33 ], which was partitioned into 7 sub-blocks, consisting of 6 typical blocks and 1 unknown block. The variables within each sub-block are listed in Table 1.
According to FDRs in Table 2, the proposed method performs much better for 10 faults.    Therefore, a higher FDR implies better monitoring performance. The FDRs of three methods are shown in Table 2, and bold numbers indicate optimum performance with certain failures. It is not difficult to see that MI-PGLPP contributes most of the highest FDR even for fault 3, fault 9 and fault 15, which are generally recognized as the most difficult monitoring faults. Moreover, the average FDR of MI-PGLPP is also higher than those of the other two methods.
All methods show acceptable performance on Fault 8, while MI-PGLPP outperforms the other two methods on Fault 12 in this study. These results fully demonstrate the effectiveness of MI-PGLPP for the TE process, which is recognized as a typical benchmark of plant-wide process monitoring applications.

Conclusions
In this paper, the MI-PGLPP method is proposed and applied to monitoring plant-wide processes. Different from PCA and GLPP, MI-PGLPP adopts a parallel distributed framework. To uncover the nonlinear characteristics between data blocks, the proposed method combines mutual information with classical GLPP, partitioning high-dimensional datasets into blocks and projecting the local structure and global information of each block into a latent variable space. For process monitoring applications, the new dataset is projected onto the transformation matrices to generate latent matrices and residuals. Leveraging the fully explored data, Bayesian inference is employed to integrate the statistical information of each data block. Ultimately, experimental monitoring of the TE process demonstrates that MI-PGLPP outperforms the other methods in terms of FDR. Further research is needed to reveal the fundamental reasons behind the performance degradation of MI-PGLPP under certain faults, which could contribute to improving the monitoring outcomes for such faults in the future.